Compare commits

..

No commits in common. "master" and "spark-operator-chart-1.1.10" have entirely different histories.

399 changed files with 41857 additions and 81939 deletions

View File

@ -1,7 +1 @@
.idea/
.vscode/
bin/
codecov.yaml
cover.out
.DS_Store
*.iml
vendor

View File

@ -1,54 +0,0 @@
name: Bug Report
description: Tell us about a problem you are experiencing with the Spark operator.
labels:
- kind/bug
- lifecycle/needs-triage
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this Spark operator bug report!
- type: textarea
id: problem
attributes:
label: What happened?
description: |
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.
If your request is for a new feature, please use the `Feature request` template.
value: |
- [ ] ✋ I have searched the open/closed issues and my issue is not listed.
validations:
required: true
- type: textarea
id: reproduce
attributes:
label: Reproduction Code
description: Steps to reproduce the behavior.
- type: textarea
id: expected
attributes:
label: Expected behavior
description: A clear and concise description of what you expected to happen.
- type: textarea
id: actual
attributes:
label: Actual behavior
description: A clear and concise description of what actually happened.
- type: textarea
id: environment
attributes:
label: Environment & Versions
value: |
- Kubernetes Version:
- Spark Operator Version:
- Apache Spark Version:
- type: textarea
id: context
attributes:
label: Additional context
description: Add any other context about the problem here.
- type: input
id: votes
attributes:
label: Impacted by this bug?
value: Give it a 👍 We prioritize the issues with most 👍

View File

@ -1,9 +0,0 @@
blank_issues_enabled: true
contact_links:
- name: Spark Operator Documentation
url: https://www.kubeflow.org/docs/components/spark-operator
about: Much help can be found in the docs
- name: Spark Operator Slack Channel
url: https://app.slack.com/client/T08PSQ7BQ/C074588U7EG
about: Ask questions about the Spark Operator

View File

@ -1,47 +0,0 @@
name: Feature Request
description: Suggest an idea for the Spark operator.
labels:
- kind/feature
- lifecycle/needs-triage
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this Spark operator feature request!
- type: markdown
attributes:
value: |
- Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request.
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
- type: textarea
id: feature
attributes:
label: What feature you would like to be added?
description: |
A clear and concise description of what you want to add to the Spark operator.
Please consider to write a Spark operator enhancement proposal if it is a large feature request.
validations:
required: true
- type: textarea
id: rationale
attributes:
label: Why is this needed?
- type: textarea
id: solution
attributes:
label: Describe the solution you would like
- type: textarea
id: alternatives
attributes:
label: Describe alternatives you have considered
- type: textarea
id: context
attributes:
label: Additional context
description: Add any other context or screenshots about the feature request here.
- type: input
id: votes
attributes:
label: Love this feature?
value: Give it a 👍 We prioritize the features with most 👍

View File

@ -1,30 +0,0 @@
name: Question
description: Ask question about the Spark operator.
labels:
- kind/question
- lifecycle/needs-triage
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this question!
- type: textarea
id: feature
attributes:
label: What question do you want to ask?
description: |
A clear and concise description of what you want to ask about the Spark operator.
value: |
- [ ] ✋ I have searched the open/closed issues and my issue is not listed.
validations:
required: true
- type: textarea
id: rationale
attributes:
label: Additional context
description: Add any other context or screenshots about the question here.
- type: input
id: votes
attributes:
label: Have the same question?
value: Give it a 👍 We prioritize the question with most 👍

View File

@ -1,42 +0,0 @@
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, check our contributor guidelines: https://www.kubeflow.org/docs/about/contributing
2. To know more about how to develop with the Spark operator, check the developer guide: https://www.kubeflow.org/docs/components/spark-operator/developer-guide/
3. If you want *faster* PR reviews, check how: https://git.k8s.io/community/contributors/guide/pull-requests.md#best-practices-for-faster-reviews
4. Please open an issue to discuss significant work before you start. We appreciate your contributions and don't want your efforts to go to waste!
-->
## Purpose of this PR
<!-- Provide a clear and concise description of the changes. Explain the motivation behind these changes and link to relevant issues or discussions. -->
**Proposed changes:**
- <Change 1>
- <Change 2>
- <Change 3>
## Change Category
<!-- Indicate the type of change by marking the applicable boxes. -->
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] Feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that could affect existing functionality)
- [ ] Documentation update
### Rationale
<!-- Provide reasoning for the changes if not already covered in the description above. -->
## Checklist
<!-- Before submitting your PR, please review the following: -->
- [ ] I have conducted a self-review of my own code.
- [ ] I have updated documentation accordingly.
- [ ] I have added tests that prove my changes are effective or that my feature works.
- [ ] Existing unit tests pass locally with my changes.
### Additional Notes
<!-- Include any additional notes or context that could be helpful for the reviewers here. -->

View File

@ -1,16 +0,0 @@
version: 2
updates:
- package-ecosystem: "gomod"
directory: "/"
schedule:
interval: "weekly"
- package-ecosystem: "docker"
directory: "/"
schedule:
interval: "weekly"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"

View File

@ -1,64 +0,0 @@
name: Check Release
on:
pull_request:
branches:
- release-*
paths:
- VERSION
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
SEMVER_PATTERN: '^v([0-9]+)\.([0-9]+)\.([0-9]+)(-rc\.([0-9]+))?$'
jobs:
check:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check whether version matches semver pattern
run: |
VERSION=$(cat VERSION)
if [[ ${VERSION} =~ ${{ env.SEMVER_PATTERN }} ]]; then
echo "Version '${VERSION}' matches semver pattern."
else
echo "Version '${VERSION}' does not match semver pattern."
exit 1
fi
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Check whether chart version and appVersion matches version
run: |
VERSION=${VERSION#v}
CHART_VERSION=$(cat charts/spark-operator-chart/Chart.yaml | grep version | awk '{print $2}')
CHART_APP_VERSION=$(cat charts/spark-operator-chart/Chart.yaml | grep appVersion | awk '{print $2}')
if [[ ${CHART_VERSION} == ${VERSION} ]]; then
echo "Chart version '${CHART_VERSION}' matches version '${VERSION}'."
else
echo "Chart version '${CHART_VERSION}' does not match version '${VERSION}'."
exit 1
fi
if [[ ${CHART_APP_VERSION} == ${VERSION} ]]; then
echo "Chart appVersion '${CHART_APP_VERSION}' matches version '${VERSION}'."
else
echo "Chart appVersion '${CHART_APP_VERSION}' does not match version '${VERSION}'."
exit 1
fi
- name: Check if tag exists
run: |
git fetch --tags
if git tag -l | grep -q "^${VERSION}$"; then
echo "Tag '${VERSION}' already exists."
exit 1
else
echo "Tag '${VERSION}' does not exist."
fi

View File

@ -1,233 +0,0 @@
name: Integration Test
on:
pull_request:
branches:
- master
- release-*
push:
branches:
- master
- release-*
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.actor }}
cancel-in-progress: true
jobs:
code-check:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Run go mod tidy
run: |
go mod tidy
if ! git diff --quiet; then
echo "Please run 'go mod tidy' and commit the changes."
git diff
false
fi
- name: Generate code
run: |
make generate
if ! git diff --quiet; then
echo "Need to re-run 'make generate' and commit the changes."
git diff
false
fi
- name: Verify Codegen
run: |
make verify-codegen
- name: Run go fmt check
run: |
make go-fmt
if ! git diff --quiet; then
echo "Need to re-run 'make go-fmt' and commit the changes."
git diff
false
fi
- name: Run go vet check
run: |
make go-vet
if ! git diff --quiet; then
echo "Need to re-run 'make go-vet' and commit the changes."
git diff
false
fi
- name: Run golangci-lint
run: |
make go-lint
build-api-docs:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Build API docs
run: |
make build-api-docs
if ! git diff --quiet; then
echo "Need to re-run 'make build-api-docs' and commit the changes."
git diff
false
fi
build-spark-operator:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Run go unit tests
run: make unit-test
- name: Build Spark operator
run: make build-operator
build-helm-chart:
runs-on: ubuntu-latest
steps:
- name: Determine branch name
id: get_branch
run: |
BRANCH=""
if [ "${{ github.event_name }}" == "push" ]; then
BRANCH=${{ github.ref_name }}
elif [ "${{ github.event_name }}" == "pull_request" ]; then
BRANCH=${{ github.base_ref }}
fi
echo "Branch name: $BRANCH"
echo "BRANCH=$BRANCH" >> "$GITHUB_OUTPUT"
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install Helm
uses: azure/setup-helm@v4
with:
version: v3.14.3
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.7.0
- name: Generate manifests
run: |
make manifests
if ! git diff --quiet; then
echo "Need to re-run 'make manifests' and commit the changes."
git diff
false
fi
- name: Detect CRDs drift between chart and manifest
run: make detect-crds-drift
- name: Run helm unittest
run: make helm-unittest
- name: Run chart-testing (list-changed)
id: list-changed
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: |
changed=$(ct list-changed --target-branch $BRANCH)
if [[ -n "$changed" ]]; then
echo "changed=true" >> "$GITHUB_OUTPUT"
fi
- name: Run chart-testing (lint)
if: steps.list-changed.outputs.changed == 'true'
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: ct lint --check-version-increment=false --target-branch $BRANCH
- name: Produce the helm documentation
if: steps.list-changed.outputs.changed == 'true'
run: |
make helm-docs
if ! git diff --quiet -- charts/spark-operator-chart/README.md; then
echo "Need to re-run 'make helm-docs' and commit the changes."
false
fi
- name: setup minikube
if: steps.list-changed.outputs.changed == 'true'
uses: manusa/actions-setup-minikube@v2.14.0
with:
minikube version: v1.33.0
kubernetes version: v1.30.0
start args: --memory 6g --cpus=2 --addons ingress
github token: ${{ inputs.github-token }}
- name: Run chart-testing (install)
if: steps.list-changed.outputs.changed == 'true'
run: |
docker build -t ghcr.io/kubeflow/spark-operator/controller:local .
minikube image load ghcr.io/kubeflow/spark-operator/controller:local
ct install --target-branch ${{ steps.get_branch.outputs.BRANCH }}
e2e-test:
runs-on: ubuntu-latest
strategy:
matrix:
k8s_version:
- v1.24.17
- v1.25.16
- v1.26.15
- v1.27.16
- v1.28.15
- v1.29.12
- v1.30.8
- v1.31.4
- v1.32.0
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Create a Kind cluster
run: make kind-create-cluster KIND_K8S_VERSION=${{ matrix.k8s_version }}
- name: Build and load image to Kind cluster
run: make kind-load-image IMAGE_TAG=local
- name: Run e2e tests
run: make e2e-test

75
.github/workflows/main.yaml vendored Normal file
View File

@ -0,0 +1,75 @@
name: Pre-commit checks
on:
pull_request:
branches:
- master
push:
branches:
- master
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.15
- name: Checkout source code
uses: actions/checkout@v2
with:
fetch-depth: 2
- name: fmt check
run: make fmt-check
- name: unit tests
run: make test
it:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.15
- name: Checkout source code
uses: actions/checkout@v2
with:
fetch-depth: 2
- name: Install Helm
uses: azure/setup-helm@v1
with:
version: v3.4.0
- uses: actions/setup-python@v2
with:
python-version: 3.7
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.0.1
- name: Print ct version information and List files
run: ct version && ls -lh
- name: Run chart-testing (lint)
run: ct lint
- name: Detect CRDs drift between chart and manifest
run: make detect-crds-drift
- name: Create kind cluster
uses: helm/kind-action@v1.2.0
- name: Run chart-testing (install)
run: ct install

View File

@ -1,84 +0,0 @@
name: Release Helm charts
on:
release:
types:
- published
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
HELM_REGISTRY: ghcr.io
HELM_REPOSITORY: ${{ github.repository_owner }}/helm-charts
jobs:
release_helm_charts:
permissions:
contents: write
packages: write
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
- name: Configure Git
run: |
git config user.name "$GITHUB_ACTOR"
git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
- name: Set up Helm
uses: azure/setup-helm@v4.2.0
with:
version: v3.14.4
- name: Login to GHCR
uses: docker/login-action@v3
with:
registry: ${{ env.HELM_REGISTRY }}
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Read version from VERSION file
run: |
VERSION=$(cat VERSION)
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Package Helm charts
run: |
for chart in $(ls charts); do
helm package charts/${chart}
done
- name: Upload charts to GHCR
run: |
for pkg in $(ls *.tgz); do
helm push ${pkg} oci://${{ env.HELM_REGISTRY }}/${{ env.HELM_REPOSITORY }}
done
- name: Save packaged charts to temp directory
run: |
mkdir -p /tmp/charts
cp *.tgz /tmp/charts
- name: Checkout to branch gh-pages
uses: actions/checkout@v4
with:
ref: gh-pages
fetch-depth: 0
- name: Copy packaged charts
run: |
cp /tmp/charts/*.tgz .
- name: Update Helm charts repo index
env:
CHART_URL: https://github.com/${{ github.repository }}/releases/download/${{ github.ref_name }}
run: |
helm repo index --merge index.yaml --url ${CHART_URL} .
git add index.yaml
git commit -s -m "Add index for Spark operator chart ${VERSION}" || exit 0
git push

View File

@ -1,260 +1,56 @@
name: Release
name: Release Charts
on:
push:
branches:
- release-*
paths:
- VERSION
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
SEMVER_PATTERN: '^v([0-9]+)\.([0-9]+)\.([0-9]+)(-rc\.([0-9]+))?$'
IMAGE_REGISTRY: ghcr.io
IMAGE_REPOSITORY: kubeflow/spark-operator/controller
- master
jobs:
check-release:
release:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check whether version matches semver pattern
run: |
VERSION=$(cat VERSION)
if [[ ${VERSION} =~ ${{ env.SEMVER_PATTERN }} ]]; then
echo "Version '${VERSION}' matches semver pattern."
else
echo "Version '${VERSION}' does not match semver pattern."
exit 1
fi
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Check whether chart version and appVersion matches version
run: |
VERSION=${VERSION#v}
CHART_VERSION=$(cat charts/spark-operator-chart/Chart.yaml | grep version | awk '{print $2}')
CHART_APP_VERSION=$(cat charts/spark-operator-chart/Chart.yaml | grep appVersion | awk '{print $2}')
if [[ ${CHART_VERSION} == ${VERSION} ]]; then
echo "Chart version '${CHART_VERSION}' matches version '${VERSION}'."
else
echo "Chart version '${CHART_VERSION}' does not match version '${VERSION}'."
exit 1
fi
if [[ ${CHART_APP_VERSION} == ${VERSION} ]]; then
echo "Chart appVersion '${CHART_APP_VERSION}' matches version '${VERSION}'."
else
echo "Chart appVersion '${CHART_APP_VERSION}' does not match version '${VERSION}'."
exit 1
fi
- name: Check if tag exists
run: |
git fetch --tags
if git tag -l | grep -q "^${VERSION}$"; then
echo "Tag '${VERSION}' already exists."
exit 1
else
echo "Tag '${VERSION}' does not exist."
fi
build_images:
needs:
- check-release
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
platform:
- linux/amd64
- linux/arm64
steps:
- name: Prepare
run: |
platform=${{ matrix.platform }}
echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV
- name: Checkout source code
uses: actions/checkout@v4
- name: Read version from VERSION file
run: |
VERSION=$(cat VERSION)
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}
tags: |
type=semver,pattern={{version}},value=${{ env.VERSION }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker buildx
uses: docker/setup-buildx-action@v3
- name: Login to container registry
uses: docker/login-action@v3
with:
registry: ${{ env.IMAGE_REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push by digest
id: build
uses: docker/build-push-action@v6
with:
platforms: ${{ matrix.platform }}
labels: ${{ steps.meta.outputs.labels }}
outputs: type=image,name=${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }},push-by-digest=true,name-canonical=true,push=true
- name: Export digest
run: |
mkdir -p /tmp/digests
digest="${{ steps.build.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@v4
with:
name: digests-${{ env.PLATFORM_PAIR }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
release_images:
needs:
- build_images
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
- name: Read version from VERSION file
run: |
VERSION=$(cat VERSION)
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}
tags: |
type=semver,pattern={{version}},value=${{ env.VERSION }}
- name: Download digests
uses: actions/download-artifact@v4
with:
path: /tmp/digests
pattern: digests-*
merge-multiple: true
- name: Set up Docker buildx
uses: docker/setup-buildx-action@v3
- name: Login to container registry
uses: docker/login-action@v3
with:
registry: ${{ env.IMAGE_REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \
$(printf '${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}@sha256:%s ' *)
- name: Inspect image
run: |
docker buildx imagetools inspect ${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}:${{ steps.meta.outputs.version }}
push_tag:
needs:
- release_images
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Configure Git
run: |
git config user.name "$GITHUB_ACTOR"
git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
- name: Read version from VERSION file
run: |
VERSION=$(cat VERSION)
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Create and push tag
run: |
git tag -a "${VERSION}" -m "Spark Operator Official Release ${VERSION}"
git push origin "${VERSION}"
draft_release:
needs:
- push_tag
permissions:
contents: write
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Configure Git
run: |
git config user.name "$GITHUB_ACTOR"
git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
- name: Read version from VERSION file
run: |
VERSION=$(cat VERSION)
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Set up Helm
uses: azure/setup-helm@v4.2.0
- name: Install Helm
uses: azure/setup-helm@v1
with:
version: v3.14.4
version: v3.4.0
- name: Package Helm charts
run: |
for chart in $(ls charts); do
helm package charts/${chart}
done
- name: Release
id: release
uses: softprops/action-gh-release@v2
- uses: actions/setup-python@v2
with:
token: ${{ secrets.GITHUB_TOKEN }}
name: "Spark Operator ${{ env.VERSION }}"
tag_name: ${{ env.VERSION }}
prerelease: ${{ contains(env.VERSION, 'rc') }}
target_commitish: ${{ github.sha }}
draft: true
files: |
*.tgz
python-version: 3.7
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.0.1
- name: Run chart-testing (list-changed)
id: list-changed
run: |
changed=$(ct list-changed)
if [[ -n "$changed" ]]; then
echo "::set-output name=changed::true"
fi
- name: Run chart-testing (lint)
run: ct lint
- name: Create kind cluster
uses: helm/kind-action@v1.2.0
if: steps.list-changed.outputs.changed == 'true'
- name: Run chart-testing (install)
run: ct install
- name: Run chart-releaser
uses: helm/chart-releaser-action@v1.1.0
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
CR_RELEASE_NAME_TEMPLATE: "spark-operator-chart-{{ .Version }}"

View File

@ -1,39 +0,0 @@
name: Mark stale issues and pull requests
on:
schedule:
- cron: "0 */2 * * *"
jobs:
stale:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v9
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
days-before-stale: 90
days-before-close: 20
operations-per-run: 200
stale-issue-message: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
close-issue-message: >
This issue has been automatically closed because it has not had recent
activity. Please comment "/reopen" to reopen it.
stale-issue-label: lifecycle/stale
exempt-issue-labels: lifecycle/frozen
stale-pr-message: >
This pull request has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
close-pr-message: >
This pull request has been automatically closed because it has not had recent
activity. Please comment "/reopen" to reopen it.
stale-pr-label: lifecycle/stale
exempt-pr-labels: lifecycle/frozen

View File

@ -1,32 +0,0 @@
name: Trivy image scanning
on:
workflow_dispatch:
schedule:
- cron: '0 0 * * 1' # Every Monday at 00:00
jobs:
image-scanning:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Add image to environment
run: make print-IMAGE >> $GITHUB_ENV
- name: trivy scan for github security tab
uses: aquasecurity/trivy-action@0.32.0
with:
image-ref: '${{ env.IMAGE }}'
format: 'sarif'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'
output: 'trivy-results.sarif'
timeout: 30m0s
- name: Upload Trivy scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: 'trivy-results.sarif'

14
.gitignore vendored
View File

@ -1,7 +1,9 @@
.idea/
.vscode/
bin/
codecov.yaml
cover.out
.DS_Store
*.iml
vendor/
spark-operator
.idea/
**/*.iml
sparkctl/sparkctl
spark-on-k8s-operator
sparkctl/sparkctl-linux-amd64
sparkctl/sparkctl-darwin-amd64

30
.gitlab-ci.yml Normal file
View File

@ -0,0 +1,30 @@
stages:
- build
variables:
DEP_VERSION: "0.5.3"
build:
stage: build
image: docker:stable
services:
- docker:dind
before_script:
- apk --no-cache add git
variables:
DOCKER_HOST: tcp://docker:2375
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY ;
- >
if [ "${SPARK_REGISTRY}" != "" -a "${SPARK_VERSION}" != "" ] ; then
tagStamp=$(git describe --tags --dirty)_${SPARK_VERSION}
echo Using SPARK_IMAGE ${SPARK_REGISTRY}:${SPARK_VERSION}
echo CI_REGISTRY_IMAGE_TAG is ${CI_REGISTRY_IMAGE}/spark-operator:${tagStamp}
docker build --build-arg SPARK_IMAGE=${SPARK_REGISTRY}:${SPARK_VERSION} -t ${CI_REGISTRY_IMAGE}/spark-operator:${tagStamp} .
else
tagStamp=$(git describe --tags --dirty) ; echo tagStamp is ${tagStamp} ;
echo CI_REGISTRY_IMAGE_TAG is ${CI_REGISTRY_IMAGE}/spark-operator:${tagStamp}
docker build -t ${CI_REGISTRY_IMAGE}/spark-operator:${tagStamp} .
fi
- time docker push ${CI_REGISTRY_IMAGE}/spark-operator:${tagStamp}
- docker images

View File

@ -1,65 +0,0 @@
version: "2"
run:
# Timeout for total work, e.g. 30s, 5m, 5m30s.
# If the value is lower or equal to 0, the timeout is disabled.
# Default: 0 (disabled)
timeout: 2m
linters:
# Enable specific linters.
# https://golangci-lint.run/usage/linters/#enabled-by-default
enable:
# Detects places where loop variables are copied.
- copyloopvar
# Checks for duplicate words in the source code.
- dupword
# Tool for detection of FIXME, TODO and other comment keywords.
# - godox
# Enforces consistent import aliases.
- importas
# Find code that shadows one of Go's predeclared identifiers.
- predeclared
# Check that struct tags are well aligned.
- tagalign
# Remove unnecessary type conversions.
- unconvert
# Checks Go code for unused constants, variables, functions and types.
- unused
settings:
importas:
# List of aliases
alias:
- pkg: k8s.io/api/admissionregistration/v1
alias: admissionregistrationv1
- pkg: k8s.io/api/apps/v1
alias: appsv1
- pkg: k8s.io/api/batch/v1
alias: batchv1
- pkg: k8s.io/api/core/v1
alias: corev1
- pkg: k8s.io/api/extensions/v1beta1
alias: extensionsv1beta1
- pkg: k8s.io/api/networking/v1
alias: networkingv1
- pkg: k8s.io/apimachinery/pkg/apis/meta/v1
alias: metav1
- pkg: sigs.k8s.io/controller-runtime
alias: ctrl
issues:
# Maximum issues count per one linter.
# Set to 0 to disable.
# Default: 50
max-issues-per-linter: 50
# Maximum count of issues with the same text.
# Set to 0 to disable.
# Default: 3
max-same-issues: 3
formatters:
enable:
# Check import statements are formatted according to the 'goimport' command.
- goimports

View File

@ -1,10 +0,0 @@
repos:
- repo: https://github.com/norwoodj/helm-docs
rev: "v1.13.1"
hooks:
- id: helm-docs
args:
# Make the tool search for charts only under the `charts` directory
- --chart-search-root=charts
- --template-files=README.md.gotmpl
- --sort-values-order=file

24
.travis.gofmt.sh Executable file
View File

@ -0,0 +1,24 @@
#!/bin/bash
#
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
cd "$(dirname $0)"
if [ -n "$(go fmt ./...)" ];
then
echo "Go code is not formatted, please run 'go fmt ./...'." >&2
exit 1
else
echo "Go code is formatted"
fi

29
.travis.yml Normal file
View File

@ -0,0 +1,29 @@
#
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
language: go
go:
- 1.15.x
go_import_path: github.com/GoogleCloudPlatform/spark-on-k8s-operator
script:
- go test -v ./...
- ./.travis.gofmt.sh

View File

@ -1,723 +0,0 @@
# Changelog
## [v2.2.1](https://github.com/kubeflow/spark-operator/tree/v2.2.1) (2025-06-27)
### Features
- Customize ingress URL with Spark application ID ([#2554](https://github.com/kubeflow/spark-operator/pull/2554) by [@ChenYi015](https://github.com/ChenYi015))
- Make default ingress tls and annotations congurable in the helm config ([#2513](https://github.com/kubeflow/spark-operator/pull/2513) by [@Tom-Newton](https://github.com/Tom-Newton))
- Use code-generator for clientset, informers, listers ([#2563](https://github.com/kubeflow/spark-operator/pull/2563) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
### Misc
- add driver ingress unit tests ([#2552](https://github.com/kubeflow/spark-operator/pull/2552) by [@nabuskey](https://github.com/nabuskey))
- Get logger from context ([#2551](https://github.com/kubeflow/spark-operator/pull/2551) by [@ChenYi015](https://github.com/ChenYi015))
- Update golangci lint ([#2560](https://github.com/kubeflow/spark-operator/pull/2560) by [@joshuacuellar1](https://github.com/joshuacuellar1))
### Dependencies
- Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 ([#2557](https://github.com/kubeflow/spark-operator/pull/2557) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/prometheus/client_golang from 1.21.1 to 1.22.0 ([#2548](https://github.com/kubeflow/spark-operator/pull/2548) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump sigs.k8s.io/scheduler-plugins from 0.30.6 to 0.31.8 ([#2549](https://github.com/kubeflow/spark-operator/pull/2549) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/mod from 0.24.0 to 0.25.0 ([#2566](https://github.com/kubeflow/spark-operator/pull/2566) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/go-logr/logr from 1.4.2 to 1.4.3 ([#2567](https://github.com/kubeflow/spark-operator/pull/2567) by [@dependabot[bot]](https://github.com/apps/dependabot))
## [v2.2.0](https://github.com/kubeflow/spark-operator/tree/v2.2.0) (2025-05-29)
### Features
- Upgrade to Spark 3.5.5 ([#2490](https://github.com/kubeflow/spark-operator/pull/2490) by [@jacobsalway](https://github.com/jacobsalway))
- Add timeZone to ScheduledSparkApplication ([#2471](https://github.com/kubeflow/spark-operator/pull/2471) by [@jacobsalway](https://github.com/jacobsalway))
- Enable the override of MemoryLimit through webhook ([#2478](https://github.com/kubeflow/spark-operator/pull/2478) by [@danielrsfreitas](https://github.com/danielrsfreitas))
- Add ShuffleTrackingEnabled to DynamicAllocation struct to allow disabling shuffle tracking ([#2511](https://github.com/kubeflow/spark-operator/pull/2511) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
- Define SparkApplicationSubmitter interface to allow customizing submitting mechanism ([#2500](https://github.com/kubeflow/spark-operator/pull/2500) by [@ChenYi015](https://github.com/ChenYi015))
- Add support for using cert manager to generate webhook certificates ([#2373](https://github.com/kubeflow/spark-operator/pull/2373) by [@ChenYi015](https://github.com/ChenYi015))
### Bug Fixes
- fix: add webhook cert validity checking ([#2489](https://github.com/kubeflow/spark-operator/pull/2489) by [@teejaded](https://github.com/teejaded))
- fix and add back unit tests ([#2532](https://github.com/kubeflow/spark-operator/pull/2532) by [@nabuskey](https://github.com/nabuskey))
- fix volcano tests ([#2533](https://github.com/kubeflow/spark-operator/pull/2533) by [@nabuskey](https://github.com/nabuskey))
- Add v2 to module path ([#2515](https://github.com/kubeflow/spark-operator/pull/2515) by [@ChenYi015](https://github.com/ChenYi015))
- #2525 spark metrics in depends on prometheus ([#2529](https://github.com/kubeflow/spark-operator/pull/2529) by [@blcksrx](https://github.com/blcksrx))
### Misc
- Add APRA AMCOS to adopters ([#2485](https://github.com/kubeflow/spark-operator/pull/2485) by [@shuch3ng](https://github.com/shuch3ng))
- Bump github.com/stretchr/testify from 1.9.0 to 1.10.0 ([#2488](https://github.com/kubeflow/spark-operator/pull/2488) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.1 ([#2487](https://github.com/kubeflow/spark-operator/pull/2487) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.4 ([#2486](https://github.com/kubeflow/spark-operator/pull/2486) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Deprecating sparkctl ([#2484](https://github.com/kubeflow/spark-operator/pull/2484) by [@vikas-saxena02](https://github.com/vikas-saxena02))
- Changing image repo from docker.io to ghcr.io ([#2483](https://github.com/kubeflow/spark-operator/pull/2483) by [@vikas-saxena02](https://github.com/vikas-saxena02))
- Upgrade Golang to 1.24.1 and golangci-lint to 1.64.8 ([#2494](https://github.com/kubeflow/spark-operator/pull/2494) by [@jacobsalway](https://github.com/jacobsalway))
- Bump helm.sh/helm/v3 from 3.16.2 to 3.17.3 ([#2503](https://github.com/kubeflow/spark-operator/pull/2503) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add changelog for v2.1.1 ([#2504](https://github.com/kubeflow/spark-operator/pull/2504) by [@ChenYi015](https://github.com/ChenYi015))
- Remove sparkctl ([#2466](https://github.com/kubeflow/spark-operator/pull/2466) by [@ChenYi015](https://github.com/ChenYi015))
- Bump github.com/spf13/viper from 1.19.0 to 1.20.1 ([#2496](https://github.com/kubeflow/spark-operator/pull/2496) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.37.0 to 0.38.0 ([#2505](https://github.com/kubeflow/spark-operator/pull/2505) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Remove clientset, informer and listers generated by code-generator ([#2506](https://github.com/kubeflow/spark-operator/pull/2506) by [@ChenYi015](https://github.com/ChenYi015))
- Remove v1beta1 API ([#2516](https://github.com/kubeflow/spark-operator/pull/2516) by [@ChenYi015](https://github.com/ChenYi015))
- add unit tests for driver and executor configs ([#2521](https://github.com/kubeflow/spark-operator/pull/2521) by [@nabuskey](https://github.com/nabuskey))
- Adding securityContext to spark examples ([#2530](https://github.com/kubeflow/spark-operator/pull/2530) by [@tarekabouzeid](https://github.com/tarekabouzeid))
- Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 ([#2497](https://github.com/kubeflow/spark-operator/pull/2497) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/mod from 0.23.0 to 0.24.0 ([#2495](https://github.com/kubeflow/spark-operator/pull/2495) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Adding Manabu to the reviewers ([#2522](https://github.com/kubeflow/spark-operator/pull/2522) by [@vara-bonthu](https://github.com/vara-bonthu))
- Bump manusa/actions-setup-minikube from 2.13.1 to 2.14.0 ([#2523](https://github.com/kubeflow/spark-operator/pull/2523) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump k8s.io dependencies to v0.32.5 ([#2540](https://github.com/kubeflow/spark-operator/pull/2540) by [@ChenYi015](https://github.com/ChenYi015))
- Pass the correct LDFLAGS when building the operator image ([#2541](https://github.com/kubeflow/spark-operator/pull/2541) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/v2.1.1...v2.2.0)
## [v2.1.1](https://github.com/kubeflow/spark-operator/tree/v2.1.1) (2025-03-21)
### Features
- Adding seccompProfile RuntimeDefault ([#2397](https://github.com/kubeflow/spark-operator/pull/2397) by [@tarekabouzeid](https://github.com/tarekabouzeid))
- Add option for disabling leader election ([#2423](https://github.com/kubeflow/spark-operator/pull/2423) by [@ChenYi015](https://github.com/ChenYi015))
- Controller should only be granted event permissions in spark job namespaces ([#2426](https://github.com/kubeflow/spark-operator/pull/2426) by [@ChenYi015](https://github.com/ChenYi015))
- Make image optional ([#2439](https://github.com/kubeflow/spark-operator/pull/2439) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
- Support non-standard Spark container names ([#2441](https://github.com/kubeflow/spark-operator/pull/2441) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
- add support for metrics-job-start-latency-buckets flag in helm ([#2450](https://github.com/kubeflow/spark-operator/pull/2450) by [@nabuskey](https://github.com/nabuskey))
### Bug Fixes
- fix: webhook fail to add lifecycle to Spark3 executor pods ([#2458](https://github.com/kubeflow/spark-operator/pull/2458) by [@pvbouwel](https://github.com/pvbouwel))
- change env in executorSecretOption ([#2467](https://github.com/kubeflow/spark-operator/pull/2467) by [@TQJADE](https://github.com/TQJADE))
### Misc
- Move sparkctl to cmd directory ([#2347](https://github.com/kubeflow/spark-operator/pull/2347) by [@ChenYi015](https://github.com/ChenYi015))
- Bump golang.org/x/net from 0.30.0 to 0.32.0 ([#2350](https://github.com/kubeflow/spark-operator/pull/2350) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/crypto from 0.30.0 to 0.31.0 ([#2365](https://github.com/kubeflow/spark-operator/pull/2365) by [@dependabot[bot]](https://github.com/apps/dependabot))
- add an example of using prometheus servlet ([#2403](https://github.com/kubeflow/spark-operator/pull/2403) by [@nabuskey](https://github.com/nabuskey))
- Remove dependency on `k8s.io/kubernetes` ([#2398](https://github.com/kubeflow/spark-operator/pull/2398) by [@jacobsalway](https://github.com/jacobsalway))
- fix make deploy and install ([#2412](https://github.com/kubeflow/spark-operator/pull/2412) by [@nabuskey](https://github.com/nabuskey))
- Add helm unittest step to integration test workflow ([#2424](https://github.com/kubeflow/spark-operator/pull/2424) by [@ChenYi015](https://github.com/ChenYi015))
- ensure passed context is used ([#2432](https://github.com/kubeflow/spark-operator/pull/2432) by [@nabuskey](https://github.com/nabuskey))
- Bump manusa/actions-setup-minikube from 2.13.0 to 2.13.1 ([#2390](https://github.com/kubeflow/spark-operator/pull/2390) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump helm/chart-testing-action from 2.6.1 to 2.7.0 ([#2391](https://github.com/kubeflow/spark-operator/pull/2391) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/mod from 0.21.0 to 0.23.0 ([#2427](https://github.com/kubeflow/spark-operator/pull/2427) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/golang/glog from 1.2.2 to 1.2.4 ([#2411](https://github.com/kubeflow/spark-operator/pull/2411) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.32.0 to 0.35.0 ([#2428](https://github.com/kubeflow/spark-operator/pull/2428) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Support Kubernetes 1.32 ([#2416](https://github.com/kubeflow/spark-operator/pull/2416) by [@jacobsalway](https://github.com/jacobsalway))
- use cmd context in sparkctl ([#2447](https://github.com/kubeflow/spark-operator/pull/2447) by [@nabuskey](https://github.com/nabuskey))
- Bump golang.org/x/net from 0.35.0 to 0.36.0 ([#2470](https://github.com/kubeflow/spark-operator/pull/2470) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump aquasecurity/trivy-action from 0.29.0 to 0.30.0 ([#2475](https://github.com/kubeflow/spark-operator/pull/2475) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.35.0 to 0.37.0 ([#2472](https://github.com/kubeflow/spark-operator/pull/2472) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/containerd/containerd from 1.7.19 to 1.7.27 ([#2476](https://github.com/kubeflow/spark-operator/pull/2476) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump k8s.io/apimachinery from 0.32.0 to 0.32.3 ([#2474](https://github.com/kubeflow/spark-operator/pull/2474) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.66.0 to 1.78.2 ([#2473](https://github.com/kubeflow/spark-operator/pull/2473) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.28.0 to 1.29.9 ([#2463](https://github.com/kubeflow/spark-operator/pull/2463) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump sigs.k8s.io/scheduler-plugins from 0.29.8 to 0.30.6 ([#2444](https://github.com/kubeflow/spark-operator/pull/2444) by [@dependabot[bot]](https://github.com/apps/dependabot))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/v2.1.0...v2.1.1)
## [v2.1.0](https://github.com/kubeflow/spark-operator/tree/v2.1.0) (2024-12-06)
### New Features
- Upgrade to Spark 3.5.3 ([#2202](https://github.com/kubeflow/spark-operator/pull/2202) by [@jacobsalway](https://github.com/jacobsalway))
- feat: support archives param for spark-submit ([#2256](https://github.com/kubeflow/spark-operator/pull/2256) by [@kaka-zb](https://github.com/kaka-zb))
- Allow --ingress-class-name to be specified in chart ([#2278](https://github.com/kubeflow/spark-operator/pull/2278) by [@jacobsalway](https://github.com/jacobsalway))
- Update default container security context ([#2265](https://github.com/kubeflow/spark-operator/pull/2265) by [@ChenYi015](https://github.com/ChenYi015))
- Support pod template for Spark 3.x applications ([#2141](https://github.com/kubeflow/spark-operator/pull/2141) by [@ChenYi015](https://github.com/ChenYi015))
- Allow setting automountServiceAccountToken ([#2298](https://github.com/kubeflow/spark-operator/pull/2298) by [@Aranch](https://github.com/Aransh))
- Allow the Controller and Webhook Containers to run with the securityContext: readOnlyRootfilesystem: true ([#2282](https://github.com/kubeflow/spark-operator/pull/2282) by [@npgretz](https://github.com/npgretz))
- Use NSS_WRAPPER_PASSWD instead of /etc/passwd as in spark-operator image entrypoint.sh ([#2312](https://github.com/kubeflow/spark-operator/pull/2312) by [@Aakcht](https://github.com/Aakcht))
### Bug Fixes
- Minor fixes to e2e test `make` targets ([#2242](https://github.com/kubeflow/spark-operator/pull/2242) by [@Tom-Newton](https://github.com/Tom-Newton))
- Added off heap memory to calculation for YuniKorn gang scheduling ([#2209](https://github.com/kubeflow/spark-operator/pull/2209) by [@guangyu-yang-rokt](https://github.com/guangyu-yang-rokt))
- Add permissions to controller serviceaccount to list and watch ingresses ([#2246](https://github.com/kubeflow/spark-operator/pull/2246) by [@tcassaert](https://github.com/tcassaert))
- Make sure enable-ui-service flag is set to false when controller.uiService.enable is set to false ([#2261](https://github.com/kubeflow/spark-operator/pull/2261) by [@Roberdvs](https://github.com/Roberdvs))
- `omitempty` corrections ([#2255](https://github.com/kubeflow/spark-operator/pull/2255) by [@Tom-Newton](https://github.com/Tom-Newton))
- Fix retries ([#2241](https://github.com/kubeflow/spark-operator/pull/2241) by [@Tom-Newton](https://github.com/Tom-Newton))
- Fix: executor container security context does not work ([#2306](https://github.com/kubeflow/spark-operator/pull/2306) by [@ChenYi015](https://github.com/ChenYi015))
- Fix: should not add emptyDir sizeLimit conf if it is nil ([#2305](https://github.com/kubeflow/spark-operator/pull/2305) by [@ChenYi015](https://github.com/ChenYi015))
- Fix: should not add emptyDir sizeLimit conf on executor pods if it is nil ([#2316](https://github.com/kubeflow/spark-operator/pull/2316) by [@Cian911](https://github.com/Cian911))
- Truncate UI service name if over 63 characters ([#2311](https://github.com/kubeflow/spark-operator/pull/2311) by [@jacobsalway](https://github.com/jacobsalway))
- The webhook-key-name command-line param isn't taking effect ([#2344](https://github.com/kubeflow/spark-operator/pull/2344) by [@c-h-afzal](https://github.com/c-h-afzal))
- Robustness to driver pod taking time to create ([#2315](https://github.com/kubeflow/spark-operator/pull/2315) by [@Tom-Newton](https://github.com/Tom-Newton))
### Misc
- remove redundant test.sh file ([#2243](https://github.com/kubeflow/spark-operator/pull/2243) by [@ChenYi015](https://github.com/ChenYi015))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.42 to 1.27.43 ([#2252](https://github.com/kubeflow/spark-operator/pull/2252) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump manusa/actions-setup-minikube from 2.12.0 to 2.13.0 ([#2247](https://github.com/kubeflow/spark-operator/pull/2247) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.29.0 to 0.30.0 ([#2251](https://github.com/kubeflow/spark-operator/pull/2251) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump aquasecurity/trivy-action from 0.24.0 to 0.27.0 ([#2248](https://github.com/kubeflow/spark-operator/pull/2248) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump gocloud.dev from 0.39.0 to 0.40.0 ([#2250](https://github.com/kubeflow/spark-operator/pull/2250) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add Quick Start guide to README ([#2259](https://github.com/kubeflow/spark-operator/pull/2259) by [@jacobsalway](https://github.com/jacobsalway))
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.63.3 to 1.65.3 ([#2249](https://github.com/kubeflow/spark-operator/pull/2249) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add release badge to README ([#2263](https://github.com/kubeflow/spark-operator/pull/2263) by [@jacobsalway](https://github.com/jacobsalway))
- Bump helm.sh/helm/v3 from 3.16.1 to 3.16.2 ([#2275](https://github.com/kubeflow/spark-operator/pull/2275) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 ([#2274](https://github.com/kubeflow/spark-operator/pull/2274) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump cloud.google.com/go/storage from 1.44.0 to 1.45.0 ([#2273](https://github.com/kubeflow/spark-operator/pull/2273) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Run e2e tests with Kubernetes version matrix ([#2266](https://github.com/kubeflow/spark-operator/pull/2266) by [@jacobsalway](https://github.com/jacobsalway))
- Bump aquasecurity/trivy-action from 0.27.0 to 0.28.0 ([#2270](https://github.com/kubeflow/spark-operator/pull/2270) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.65.3 to 1.66.0 ([#2271](https://github.com/kubeflow/spark-operator/pull/2271) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.43 to 1.28.0 ([#2272](https://github.com/kubeflow/spark-operator/pull/2272) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add workflow for releasing sparkctl binary ([#2264](https://github.com/kubeflow/spark-operator/pull/2264) by [@ChenYi015](https://github.com/ChenYi015))
- Bump `volcano.sh/apis` to 1.10.0 ([#2320](https://github.com/kubeflow/spark-operator/pull/2320) by [@jacobsalway](https://github.com/jacobsalway))
- Bump aquasecurity/trivy-action from 0.28.0 to 0.29.0 ([#2332](https://github.com/kubeflow/spark-operator/pull/2332) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/onsi/ginkgo/v2 from 2.20.2 to 2.22.0 ([#2335](https://github.com/kubeflow/spark-operator/pull/2335) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Move sparkctl to cmd directory ([#2347](https://github.com/kubeflow/spark-operator/pull/2347) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/a8b5d6...v2.1.0 )
## [v2.0.2](https://github.com/kubeflow/spark-operator/tree/v2.0.2) (2024-10-10)
### Bug Fixes
- Fix ingress capability discovery ([#2201](https://github.com/kubeflow/spark-operator/pull/2201) by [@jacobsalway](https://github.com/jacobsalway))
- fix: imagePullPolicy was ignored ([#2222](https://github.com/kubeflow/spark-operator/pull/2222) by [@missedone](https://github.com/missedone))
- fix: spark-submission failed due to lack of permission by user `spark` ([#2223](https://github.com/kubeflow/spark-operator/pull/2223) by [@missedone](https://github.com/missedone))
- Remove `cap_net_bind_service` from image ([#2216](https://github.com/kubeflow/spark-operator/pull/2216) by [@jacobsalway](https://github.com/jacobsalway))
- fix: webhook panics due to logging ([#2232](https://github.com/kubeflow/spark-operator/pull/2232) by [@ChenYi015](https://github.com/ChenYi015))
### Misc
- Bump github.com/aws/aws-sdk-go-v2 from 1.30.5 to 1.31.0 ([#2207](https://github.com/kubeflow/spark-operator/pull/2207) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/net from 0.28.0 to 0.29.0 ([#2205](https://github.com/kubeflow/spark-operator/pull/2205) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/docker/docker from 27.0.3+incompatible to 27.1.1+incompatible ([#2125](https://github.com/kubeflow/spark-operator/pull/2125) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.58.3 to 1.63.3 ([#2206](https://github.com/kubeflow/spark-operator/pull/2206) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Update integration test workflow and add golangci lint check ([#2197](https://github.com/kubeflow/spark-operator/pull/2197) by [@ChenYi015](https://github.com/ChenYi015))
- Bump github.com/aws/aws-sdk-go-v2 from 1.31.0 to 1.32.0 ([#2229](https://github.com/kubeflow/spark-operator/pull/2229) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump cloud.google.com/go/storage from 1.43.0 to 1.44.0 ([#2228](https://github.com/kubeflow/spark-operator/pull/2228) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump manusa/actions-setup-minikube from 2.11.0 to 2.12.0 ([#2226](https://github.com/kubeflow/spark-operator/pull/2226) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump golang.org/x/time from 0.6.0 to 0.7.0 ([#2227](https://github.com/kubeflow/spark-operator/pull/2227) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.33 to 1.27.42 ([#2231](https://github.com/kubeflow/spark-operator/pull/2231) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/prometheus/client_golang from 1.19.1 to 1.20.4 ([#2204](https://github.com/kubeflow/spark-operator/pull/2204) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Add check for generating manifests and code ([#2234](https://github.com/kubeflow/spark-operator/pull/2234) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/v2.0.1...v2.0.2)
## [v2.0.1](https://github.com/kubeflow/spark-operator/tree/v2.0.1) (2024-09-26)
### New Features
- FEATURE: build operator image as non-root ([#2171](https://github.com/kubeflow/spark-operator/pull/2171) by [@ImpSy](https://github.com/ImpSy))
### Bug Fixes
- Update controller RBAC for ConfigMap and PersistentVolumeClaim ([#2187](https://github.com/kubeflow/spark-operator/pull/2187) by [@ChenYi015](https://github.com/ChenYi015))
### Misc
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.20.2 ([#2188](https://github.com/kubeflow/spark-operator/pull/2188) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump github.com/onsi/gomega from 1.33.1 to 1.34.2 ([#2189](https://github.com/kubeflow/spark-operator/pull/2189) by [@dependabot[bot]](https://github.com/apps/dependabot))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/v2.0.0...v2.0.1)
## [v2.0.0](https://github.com/kubeflow/spark-operator/tree/v2.0.0) (2024-09-23)
### Breaking Changes
- Use controller-runtime to reconsturct spark operator ([#2072](https://github.com/kubeflow/spark-operator/pull/2072) by [@ChenYi015](https://github.com/ChenYi015))
- feat: support driver and executor pod use different priority ([#2146](https://github.com/kubeflow/spark-operator/pull/2146) by [@Kevinz857](https://github.com/Kevinz857))
### New Features
- Support gang scheduling with Yunikorn ([#2107](https://github.com/kubeflow/spark-operator/pull/2107)) by [@jacobsalway](https://github.com/jacobsalway)
- Reintroduce option webhook.enable ([#2142](https://github.com/kubeflow/spark-operator/pull/2142) by [@ChenYi015](https://github.com/ChenYi015))
- Add default batch scheduler argument ([#2143](https://github.com/kubeflow/spark-operator/pull/2143) by [@jacobsalway](https://github.com/jacobsalway))
- Support extended kube-scheduler as batch scheduler ([#2136](https://github.com/kubeflow/spark-operator/pull/2136) by [@ChenYi015](https://github.com/ChenYi015))
- Set schedulerName to Yunikorn ([#2153](https://github.com/kubeflow/spark-operator/pull/2153) by [@jacobsalway](https://github.com/jacobsalway))
- Feature: Add pprof endpoint ([#2164](https://github.com/kubeflow/spark-operator/pull/2164) by [@ImpSy](https://github.com/ImpSy))
### Bug Fixes
- fix: Add default values for namespaces to match usage descriptions ([#2128](https://github.com/kubeflow/spark-operator/pull/2128) by [@snappyyouth](https://github.com/snappyyouth))
- Fix: Spark role binding did not render properly when setting spark service account name ([#2135](https://github.com/kubeflow/spark-operator/pull/2135) by [@ChenYi015](https://github.com/ChenYi015))
- fix: unable to set controller/webhook replicas to zero ([#2147](https://github.com/kubeflow/spark-operator/pull/2147) by [@ChenYi015](https://github.com/ChenYi015))
- Adding support for setting spark job namespaces to all namespaces ([#2123](https://github.com/kubeflow/spark-operator/pull/2123) by [@ChenYi015](https://github.com/ChenYi015))
- Fix: e2e test failes due to webhook not ready ([#2149](https://github.com/kubeflow/spark-operator/pull/2149) by [@ChenYi015](https://github.com/ChenYi015))
- fix: webhook not working when settings spark job namespaces to empty ([#2163](https://github.com/kubeflow/spark-operator/pull/2163) by [@ChenYi015](https://github.com/ChenYi015))
- fix: The logger had an odd number of arguments, making it panic ([#2166](https://github.com/kubeflow/spark-operator/pull/2166) by [@tcassaert](https://github.com/tcassaert))
- fix the make kind-delete-custer to avoid accidental kubeconfig deletion ([#2172](https://github.com/kubeflow/spark-operator/pull/2172) by [@ImpSy](https://github.com/ImpSy))
- Add specific error in log line when failed to create web UI service ([#2170](https://github.com/kubeflow/spark-operator/pull/2170) by [@tcassaert](https://github.com/tcassaert))
- Account for spark.executor.pyspark.memory in Yunikorn gang scheduling ([#2178](https://github.com/kubeflow/spark-operator/pull/2178) by [@jacobsalway](https://github.com/jacobsalway))
- Fix: spark application does not respect time to live seconds ([#2165](https://github.com/kubeflow/spark-operator/pull/2165) by [@ChenYi015](https://github.com/ChenYi015))
### Misc
- Update workflow and docs for releasing Spark operator ([#2089](https://github.com/kubeflow/spark-operator/pull/2089) by [@ChenYi015](https://github.com/ChenYi015))
- Fix broken integration test CI ([#2109](https://github.com/kubeflow/spark-operator/pull/2109) by [@ChenYi015](https://github.com/ChenYi015))
- Fix CI: environment variable BRANCH is missed ([#2111](https://github.com/kubeflow/spark-operator/pull/2111) by [@ChenYi015](https://github.com/ChenYi015))
- Update Makefile for building sparkctl ([#2119](https://github.com/kubeflow/spark-operator/pull/2119) by [@ChenYi015](https://github.com/ChenYi015))
- Update release workflow and docs ([#2121](https://github.com/kubeflow/spark-operator/pull/2121) by [@ChenYi015](https://github.com/ChenYi015))
- Run e2e tests on Kind ([#2148](https://github.com/kubeflow/spark-operator/pull/2148) by [@jacobsalway](https://github.com/jacobsalway))
- Upgrade to Go 1.23.1 ([#2155](https://github.com/kubeflow/spark-operator/pull/2155) by [@jacobsalway](https://github.com/jacobsalway))
- Upgrade to Spark 3.5.2 ([#2154](https://github.com/kubeflow/spark-operator/pull/2154) by [@jacobsalway](https://github.com/jacobsalway))
- Bump sigs.k8s.io/scheduler-plugins from 0.29.7 to 0.29.8 ([#2159](https://github.com/kubeflow/spark-operator/pull/2159) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump gocloud.dev from 0.37.0 to 0.39.0 ([#2160](https://github.com/kubeflow/spark-operator/pull/2160) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Update e2e tests ([#2161](https://github.com/kubeflow/spark-operator/pull/2161) by [@ChenYi015](https://github.com/ChenYi015))
- Upgrade to Spark 3.5.2(#2012) ([#2157](https://github.com/kubeflow/spark-operator/pull/2157) by [@ha2hi](https://github.com/ha2hi))
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.27 to 1.27.33 ([#2174](https://github.com/kubeflow/spark-operator/pull/2174) by [@dependabot[bot]](https://github.com/apps/dependabot))
- Bump helm.sh/helm/v3 from 3.15.3 to 3.16.1 ([#2173](https://github.com/kubeflow/spark-operator/pull/2173) by [@dependabot[bot]](https://github.com/apps/dependabot))
- implement workflow to scan latest released docker image ([#2177](https://github.com/kubeflow/spark-operator/pull/2177) by [@ImpSy](https://github.com/ImpSy))
## What's Changed
- Cherry pick #2081 #2046 #2091 #2072 by @ChenYi015 in <https://github.com/kubeflow/spark-operator/pull/2108>
- Cherry pick #2089 #2109 #2111 by @ChenYi015 in <https://github.com/kubeflow/spark-operator/pull/2110>
- Release v2.0.0-rc.0 by @ChenYi015 in <https://github.com/kubeflow/spark-operator/pull/2115>
- Cherry pick commits for releasing v2.0.0 by @ChenYi015 in <https://github.com/kubeflow/spark-operator/pull/2156>
- Release v2.0.0 by @ChenYi015 in <https://github.com/kubeflow/spark-operator/pull/2182>
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/v1beta2-1.6.2-3.5.0...v2.0.0)
## [v2.0.0-rc.0](https://github.com/kubeflow/spark-operator/tree/v2.0.0-rc.0) (2024-08-09)
### Breaking Changes
- Use controller-runtime to reconsturct spark operator ([#2072](https://github.com/kubeflow/spark-operator/pull/2072) by [@ChenYi015](https://github.com/ChenYi015))
### Misc
- Fix CI: environment variable BRANCH is missed ([#2111](https://github.com/kubeflow/spark-operator/pull/2111) by [@ChenYi015](https://github.com/ChenYi015))
- Fix broken integration test CI ([#2109](https://github.com/kubeflow/spark-operator/pull/2109) by [@ChenYi015](https://github.com/ChenYi015))
- Update workflow and docs for releasing Spark operator ([#2089](https://github.com/kubeflow/spark-operator/pull/2089) by [@ChenYi015](https://github.com/ChenYi015))
### What's Changed
- Release v2.0.0-rc.0 ([#2115](https://github.com/kubeflow/spark-operator/pull/2115) by [@ChenYi015](https://github.com/ChenYi015))
- Cherry pick #2089 #2109 #2111 ([#2110](https://github.com/kubeflow/spark-operator/pull/2110) by [@ChenYi015](https://github.com/ChenYi015))
- Cherry pick #2081 #2046 #2091 #2072 ([#2108](https://github.com/kubeflow/spark-operator/pull/2108) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.4.3...v2.0.0-rc.0)
## [spark-operator-chart-1.4.6](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.4.6) (2024-07-26)
- Add topologySpreadConstraints ([#2091](https://github.com/kubeflow/spark-operator/pull/2091) by [@jbhalodia-slack](https://github.com/jbhalodia-slack))
- Add Alibaba Cloud to adopters ([#2097](https://github.com/kubeflow/spark-operator/pull/2097) by [@ChenYi015](https://github.com/ChenYi015))
- Update Stale bot settings ([#2095](https://github.com/kubeflow/spark-operator/pull/2095) by [@andreyvelich](https://github.com/andreyvelich))
- Add @ChenYi015 to approvers ([#2096](https://github.com/kubeflow/spark-operator/pull/2096) by [@ChenYi015](https://github.com/ChenYi015))
- Add CHANGELOG.md file and use python script to generate it automatically ([#2087](https://github.com/kubeflow/spark-operator/pull/2087) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.4.5...spark-operator-chart-1.4.6)
## [spark-operator-chart-1.4.5](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.4.5) (2024-07-22)
- Update the process to build api-docs, generate CRD manifests and code ([#2046](https://github.com/kubeflow/spark-operator/pull/2046) by [@ChenYi015](https://github.com/ChenYi015))
- Add workflow for closing stale issues and PRs ([#2073](https://github.com/kubeflow/spark-operator/pull/2073) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.4.4...spark-operator-chart-1.4.5)
## [spark-operator-chart-1.4.4](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.4.4) (2024-07-22)
- Update helm docs ([#2081](https://github.com/kubeflow/spark-operator/pull/2081) by [@csp33](https://github.com/csp33))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.4.3...spark-operator-chart-1.4.4)
## [spark-operator-chart-1.4.3](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.4.3) (2024-07-03)
- Add PodDisruptionBudget to chart ([#2078](https://github.com/kubeflow/spark-operator/pull/2078) by [@csp33](https://github.com/csp33))
- Update README and documentation ([#2047](https://github.com/kubeflow/spark-operator/pull/2047) by [@ChenYi015](https://github.com/ChenYi015))
- Add code of conduct and update contributor guide ([#2074](https://github.com/kubeflow/spark-operator/pull/2074) by [@ChenYi015](https://github.com/ChenYi015))
- Remove .gitlab-ci.yml ([#2069](https://github.com/kubeflow/spark-operator/pull/2069) by [@jacobsalway](https://github.com/jacobsalway))
- Modified README.MD as per changes discussed on <https://github.com/kubeflow/spark-operator/pull/2062> ([#2066](https://github.com/kubeflow/spark-operator/pull/2066) by [@vikas-saxena02](https://github.com/vikas-saxena02))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.4.2...spark-operator-chart-1.4.3)
## [spark-operator-chart-1.4.2](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.4.2) (2024-06-17)
- Support objectSelector on mutating webhook ([#2058](https://github.com/kubeflow/spark-operator/pull/2058) by [@Cian911](https://github.com/Cian911))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.4.1...spark-operator-chart-1.4.2)
## [spark-operator-chart-1.4.1](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.4.1) (2024-06-15)
- Adding an option to set the priority class for spark-operator pod ([#2043](https://github.com/kubeflow/spark-operator/pull/2043) by [@pkgajulapalli](https://github.com/pkgajulapalli))
- Update minikube version in CI ([#2059](https://github.com/kubeflow/spark-operator/pull/2059) by [@Cian911](https://github.com/Cian911))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.4.0...spark-operator-chart-1.4.1)
## [spark-operator-chart-1.4.0](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.4.0) (2024-06-05)
- Certifictes are generated by operator rather than gencerts.sh ([#2016](https://github.com/kubeflow/spark-operator/pull/2016) by [@ChenYi015](https://github.com/ChenYi015))
- Add ChenYi015 as spark-operator reviewer ([#2045](https://github.com/kubeflow/spark-operator/pull/2045) by [@ChenYi015](https://github.com/ChenYi015))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.3.2...spark-operator-chart-1.4.0)
## [spark-operator-chart-1.3.2](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.3.2) (2024-06-05)
- Bump appVersion to v1beta2-1.5.0-3.5.0 ([#2044](https://github.com/kubeflow/spark-operator/pull/2044) by [@ChenYi015](https://github.com/ChenYi015))
- Add restartPolicy field to SparkApplication Driver/Executor initContainers CRDs ([#2022](https://github.com/kubeflow/spark-operator/pull/2022) by [@mschroering](https://github.com/mschroering))
- :memo: Add Inter&Co to who-is-using.md ([#2040](https://github.com/kubeflow/spark-operator/pull/2040) by [@ignitz](https://github.com/ignitz))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.3.1...spark-operator-chart-1.3.2)
## [spark-operator-chart-1.3.1](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.3.1) (2024-05-31)
- Chart: add POD_NAME env for leader election ([#2039](https://github.com/kubeflow/spark-operator/pull/2039) by [@Aakcht](https://github.com/Aakcht))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.3.0...spark-operator-chart-1.3.1)
## [spark-operator-chart-1.3.0](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.3.0) (2024-05-20)
- Support exposing extra TCP ports in Spark Driver via K8s Ingress ([#1998](https://github.com/kubeflow/spark-operator/pull/1998) by [@hiboyang](https://github.com/hiboyang))
- Fixes a bug with dynamic allocation forcing the executor count to be 1 even when minExecutors is set to 0 ([#1979](https://github.com/kubeflow/spark-operator/pull/1979) by [@peter-mcclonski](https://github.com/peter-mcclonski))
- Remove outdated PySpark experimental warning in example ([#2014](https://github.com/kubeflow/spark-operator/pull/2014) by [@andrejpk](https://github.com/andrejpk))
- Update Spark Job Namespace docs ([#2000](https://github.com/kubeflow/spark-operator/pull/2000) by [@matthewrossi](https://github.com/matthewrossi))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.15...spark-operator-chart-1.3.0)
## [spark-operator-chart-1.2.15](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.15) (2024-05-07)
- Fix examples ([#2010](https://github.com/kubeflow/spark-operator/pull/2010) by [@peter-mcclonski](https://github.com/peter-mcclonski))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.14...spark-operator-chart-1.2.15)
## [spark-operator-chart-1.2.14](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.14) (2024-04-26)
- feat: add support for service labels on driver-svc ([#1985](https://github.com/kubeflow/spark-operator/pull/1985) by [@Cian911](https://github.com/Cian911))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.13...spark-operator-chart-1.2.14)
## [spark-operator-chart-1.2.13](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.13) (2024-04-24)
- fix(chart): remove operator namespace default for job namespaces value ([#1989](https://github.com/kubeflow/spark-operator/pull/1989) by [@t3mi](https://github.com/t3mi))
- Fix Docker Hub Credentials in CI ([#2003](https://github.com/kubeflow/spark-operator/pull/2003) by [@andreyvelich](https://github.com/andreyvelich))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.12...spark-operator-chart-1.2.13)
## [spark-operator-chart-1.2.12](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.12) (2024-04-19)
- Add emptyDir sizeLimit support for local dirs ([#1993](https://github.com/kubeflow/spark-operator/pull/1993) by [@jacobsalway](https://github.com/jacobsalway))
- fix: Removed `publish-image` dependency on publishing the helm chart ([#1995](https://github.com/kubeflow/spark-operator/pull/1995) by [@vara-bonthu](https://github.com/vara-bonthu))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.11...spark-operator-chart-1.2.12)
## [spark-operator-chart-1.2.11](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.11) (2024-04-19)
- fix: Update Github workflow to publish Helm charts on chart changes, irrespective of image change ([#1992](https://github.com/kubeflow/spark-operator/pull/1992) by [@vara-bonthu](https://github.com/vara-bonthu))
- chore: Add Timo to user list ([#1615](https://github.com/kubeflow/spark-operator/pull/1615) by [@vanducng](https://github.com/vanducng))
- Update spark operator permissions for CRD ([#1973](https://github.com/kubeflow/spark-operator/pull/1973) by [@ChenYi015](https://github.com/ChenYi015))
- fix spark-rbac ([#1986](https://github.com/kubeflow/spark-operator/pull/1986) by [@Aransh](https://github.com/Aransh))
- Use Kubeflow Docker Hub for Spark Operator Image ([#1974](https://github.com/kubeflow/spark-operator/pull/1974) by [@andreyvelich](https://github.com/andreyvelich))
- fix: fixed serviceaccount annotations ([#1972](https://github.com/kubeflow/spark-operator/pull/1972) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.7...spark-operator-chart-1.2.11)
## [spark-operator-chart-1.2.7](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.7) (2024-04-16)
- fix: upgraded k8s deps ([#1983](https://github.com/kubeflow/spark-operator/pull/1983) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
- chore: remove k8s.io/kubernetes replaces and adapt to v1.29.3 apis ([#1968](https://github.com/kubeflow/spark-operator/pull/1968) by [@ajayk](https://github.com/ajayk))
- Add some helm chart unit tests and fix spark service account render failure when extra annotations are specified ([#1967](https://github.com/kubeflow/spark-operator/pull/1967) by [@ChenYi015](https://github.com/ChenYi015))
- feat: Doc updates, Issue and PR templates are added ([#1970](https://github.com/kubeflow/spark-operator/pull/1970) by [@vara-bonthu](https://github.com/vara-bonthu))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.5...spark-operator-chart-1.2.7)
## [spark-operator-chart-1.2.5](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.5) (2024-04-14)
- fixed docker image tag and updated chart docs ([#1969](https://github.com/kubeflow/spark-operator/pull/1969) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.2.4...spark-operator-chart-1.2.5)
## [spark-operator-chart-1.2.4](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.2.4) (2024-04-13)
- publish chart independently, incremented both chart and image versions to trigger build of both ([#1964](https://github.com/kubeflow/spark-operator/pull/1964) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
- Update helm chart README ([#1958](https://github.com/kubeflow/spark-operator/pull/1958) by [@ChenYi015](https://github.com/ChenYi015))
- fix: add containerPort declaration for webhook in helm chart ([#1961](https://github.com/kubeflow/spark-operator/pull/1961) by [@zevisert](https://github.com/zevisert))
- added id for a build job to fix digests artifact creation ([#1963](https://github.com/kubeflow/spark-operator/pull/1963) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
- support multiple namespaces ([#1955](https://github.com/kubeflow/spark-operator/pull/1955) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
- chore: replace GoogleCloudPlatform/spark-on-k8s-operator with kubeflow/spark-operator ([#1937](https://github.com/kubeflow/spark-operator/pull/1937) by [@zevisert](https://github.com/zevisert))
- Chart: add patch permissions for spark operator SA to support spark 3.5.0 ([#1884](https://github.com/kubeflow/spark-operator/pull/1884) by [@Aakcht](https://github.com/Aakcht))
- Cleanup after golang upgrade ([#1956](https://github.com/kubeflow/spark-operator/pull/1956) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
- feat: add support for custom service labels ([#1952](https://github.com/kubeflow/spark-operator/pull/1952) by [@Cian911](https://github.com/Cian911))
- upgraded golang and dependencies ([#1954](https://github.com/kubeflow/spark-operator/pull/1954) by [@AndrewChubatiuk](https://github.com/AndrewChubatiuk))
- README for installing operator using kustomize with custom namespace and service name ([#1778](https://github.com/kubeflow/spark-operator/pull/1778) by [@shahsiddharth08](https://github.com/shahsiddharth08))
- BUGFIX: Added cancel method to fix context leak ([#1917](https://github.com/kubeflow/spark-operator/pull/1917) by [@fazledyn-or](https://github.com/fazledyn-or))
- remove unmatched quotes from user-guide.md ([#1584](https://github.com/kubeflow/spark-operator/pull/1584) by [@taeyeopkim1](https://github.com/taeyeopkim1))
- Add PVC permission to Operator role ([#1889](https://github.com/kubeflow/spark-operator/pull/1889) by [@wyangsun](https://github.com/wyangsun))
- Allow to set webhook job resource limits (#1429,#1300) ([#1946](https://github.com/kubeflow/spark-operator/pull/1946) by [@karbyshevds](https://github.com/karbyshevds))
- Create OWNERS ([#1927](https://github.com/kubeflow/spark-operator/pull/1927) by [@zijianjoy](https://github.com/zijianjoy))
- fix: fix issue #1723 about spark-operator not working with volcano on OCP ([#1724](https://github.com/kubeflow/spark-operator/pull/1724) by [@disaster37](https://github.com/disaster37))
- Add Rokt to who-is-using.md ([#1867](https://github.com/kubeflow/spark-operator/pull/1867) by [@jacobsalway](https://github.com/jacobsalway))
- Handle invalid API resources in discovery ([#1758](https://github.com/kubeflow/spark-operator/pull/1758) by [@wiltonsr](https://github.com/wiltonsr))
- Fix docs for Volcano integration ([#1719](https://github.com/kubeflow/spark-operator/pull/1719) by [@VVKot](https://github.com/VVKot))
- Added qualytics to who is using ([#1736](https://github.com/kubeflow/spark-operator/pull/1736) by [@josecsotomorales](https://github.com/josecsotomorales))
- Allowing optional annotation on rbac ([#1770](https://github.com/kubeflow/spark-operator/pull/1770) by [@cxfcxf](https://github.com/cxfcxf))
- Support `seccompProfile` in Spark application CRD and fix pre-commit jobs ([#1768](https://github.com/kubeflow/spark-operator/pull/1768) by [@ordukhanian](https://github.com/ordukhanian))
- Updating webhook docs to also mention eks ([#1763](https://github.com/kubeflow/spark-operator/pull/1763) by [@JunaidChaudry](https://github.com/JunaidChaudry))
- Link to helm docs fixed ([#1783](https://github.com/kubeflow/spark-operator/pull/1783) by [@haron](https://github.com/haron))
- Improve getMasterURL() to add [] to IPv6 if needed ([#1825](https://github.com/kubeflow/spark-operator/pull/1825) by [@LittleWat](https://github.com/LittleWat))
- Add envFrom to operator deployment ([#1785](https://github.com/kubeflow/spark-operator/pull/1785) by [@matschaffer-roblox](https://github.com/matschaffer-roblox))
- Expand ingress docs a bit ([#1806](https://github.com/kubeflow/spark-operator/pull/1806) by [@matschaffer-roblox](https://github.com/matschaffer-roblox))
- Optional sidecars for operator pod ([#1754](https://github.com/kubeflow/spark-operator/pull/1754) by [@qq157755587](https://github.com/qq157755587))
- Add Roblox to who-is ([#1784](https://github.com/kubeflow/spark-operator/pull/1784) by [@matschaffer-roblox](https://github.com/matschaffer-roblox))
- Molex started using spark K8 operator. ([#1714](https://github.com/kubeflow/spark-operator/pull/1714) by [@AshishPushpSingh](https://github.com/AshishPushpSingh))
- Extra helm chart labels ([#1669](https://github.com/kubeflow/spark-operator/pull/1669) by [@kvanzuijlen](https://github.com/kvanzuijlen))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.27...spark-operator-chart-1.2.4)
## [spark-operator-chart-1.1.27](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.27) (2023-03-17)
- Added permissions for leader election #1635 ([#1647](https://github.com/kubeflow/spark-operator/pull/1647) by [@ordukhanian](https://github.com/ordukhanian))
- Fix #1393 : fix tolerations block in wrong segment for webhook jobs ([#1633](https://github.com/kubeflow/spark-operator/pull/1633) by [@zhiminglim](https://github.com/zhiminglim))
- add dependabot ([#1629](https://github.com/kubeflow/spark-operator/pull/1629) by [@monotek](https://github.com/monotek))
- Add support for `ephemeral.volumeClaimTemplate` in helm chart CRDs ([#1661](https://github.com/kubeflow/spark-operator/pull/1661) by [@ArshiAAkhavan](https://github.com/ArshiAAkhavan))
- Add Kognita to "Who is using" ([#1637](https://github.com/kubeflow/spark-operator/pull/1637) by [@claudino-kognita](https://github.com/claudino-kognita))
- add lifecycle to executor ([#1674](https://github.com/kubeflow/spark-operator/pull/1674) by [@tiechengsu](https://github.com/tiechengsu))
- Fix signal handling for non-leader processes ([#1680](https://github.com/kubeflow/spark-operator/pull/1680) by [@antonipp](https://github.com/antonipp))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.26...spark-operator-chart-1.1.27)
## [spark-operator-chart-1.1.26](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.26) (2022-10-25)
- update go to 1.19 + k8s.io libs to v0.25.3 ([#1630](https://github.com/kubeflow/spark-operator/pull/1630) by [@ImpSy](https://github.com/ImpSy))
- Update README - secrets and sidecars need mutating webhooks ([#1550](https://github.com/kubeflow/spark-operator/pull/1550) by [@djdillon](https://github.com/djdillon))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.25...spark-operator-chart-1.1.26)
## [spark-operator-chart-1.1.25](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.25) (2022-06-08)
- Webhook init and cleanup should respect nodeSelector ([#1545](https://github.com/kubeflow/spark-operator/pull/1545) by [@erikcw](https://github.com/erikcw))
- rename unit tests to integration tests in Makefile#integration-test ([#1539](https://github.com/kubeflow/spark-operator/pull/1539) by [@dcoliversun](https://github.com/dcoliversun))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.24...spark-operator-chart-1.1.25)
## [spark-operator-chart-1.1.24](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.24) (2022-06-01)
- Fix: use V1 api for CRDs for volcano integration ([#1540](https://github.com/kubeflow/spark-operator/pull/1540) by [@Aakcht](https://github.com/Aakcht))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.23...spark-operator-chart-1.1.24)
## [spark-operator-chart-1.1.23](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.23) (2022-05-18)
- fix: add pre-upgrade hook to rbac resources ([#1511](https://github.com/kubeflow/spark-operator/pull/1511) by [@cwyl02](https://github.com/cwyl02))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.22...spark-operator-chart-1.1.23)
## [spark-operator-chart-1.1.22](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.22) (2022-05-16)
- Fixes issue #1467 (issue when deleting SparkApplication without metrics server) ([#1530](https://github.com/kubeflow/spark-operator/pull/1530) by [@aneagoe](https://github.com/aneagoe))
- Implement --logs and --delete flags on 'sparkctl create' and a timeout on 'sparkctl log' to wait a pod startup ([#1506](https://github.com/kubeflow/spark-operator/pull/1506) by [@alaurentinoofficial](https://github.com/alaurentinoofficial))
- Fix Spark UI URL in app status ([#1518](https://github.com/kubeflow/spark-operator/pull/1518) by [@gtopper](https://github.com/gtopper))
- remove quotes from yaml file ([#1524](https://github.com/kubeflow/spark-operator/pull/1524) by [@zencircle](https://github.com/zencircle))
- Added missing manifest yaml, point the manifest to the right direction ([#1504](https://github.com/kubeflow/spark-operator/pull/1504) by [@RonZhang724](https://github.com/RonZhang724))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.21...spark-operator-chart-1.1.22)
## [spark-operator-chart-1.1.21](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.21) (2022-05-12)
- Ensure that driver is deleted prior to sparkapplication resubmission ([#1521](https://github.com/kubeflow/spark-operator/pull/1521) by [@khorshuheng](https://github.com/khorshuheng))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.20...spark-operator-chart-1.1.21)
## [spark-operator-chart-1.1.20](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.20) (2022-04-11)
- Add ingress-class-name controller flag ([#1482](https://github.com/kubeflow/spark-operator/pull/1482) by [@voyvodov](https://github.com/voyvodov))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.19...spark-operator-chart-1.1.20)
## [spark-operator-chart-1.1.19](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.19) (2022-02-14)
- Add Operator volumes and volumeMounts in chart ([#1475](https://github.com/kubeflow/spark-operator/pull/1475) by [@ocworld](https://github.com/ocworld))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.18...spark-operator-chart-1.1.19)
## [spark-operator-chart-1.1.18](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.18) (2022-02-13)
- Updated default registry to ghcr.io ([#1454](https://github.com/kubeflow/spark-operator/pull/1454) by [@aneagoe](https://github.com/aneagoe))
- Github actions workflow fix for Helm chart deployment ([#1456](https://github.com/kubeflow/spark-operator/pull/1456) by [@vara-bonthu](https://github.com/vara-bonthu))
- Kubernetes v1.22 extensions/v1beta1 API removal ([#1427](https://github.com/kubeflow/spark-operator/pull/1427) by [@aneagoe](https://github.com/aneagoe))
- Fixes an issue with github action in job build-spark-operator ([#1452](https://github.com/kubeflow/spark-operator/pull/1452) by [@aneagoe](https://github.com/aneagoe))
- use github container registry instead of gcr.io for releases ([#1422](https://github.com/kubeflow/spark-operator/pull/1422) by [@TomHellier](https://github.com/TomHellier))
- Fixes an error that was preventing the pods from being mutated ([#1421](https://github.com/kubeflow/spark-operator/pull/1421) by [@ssullivan](https://github.com/ssullivan))
- Make github actions more feature complete ([#1418](https://github.com/kubeflow/spark-operator/pull/1418) by [@TomHellier](https://github.com/TomHellier))
- Resolves an error when deploying the webhook where the k8s api indica… ([#1413](https://github.com/kubeflow/spark-operator/pull/1413) by [@ssullivan](https://github.com/ssullivan))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.15...spark-operator-chart-1.1.18)
## [spark-operator-chart-1.1.15](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.15) (2021-12-02)
- Add docker build to github action ([#1415](https://github.com/kubeflow/spark-operator/pull/1415) by [@TomHellier](https://github.com/TomHellier))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.14...spark-operator-chart-1.1.15)
## [spark-operator-chart-1.1.14](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.14) (2021-11-30)
- Updating API version of admissionregistration.k8s.io ([#1401](https://github.com/kubeflow/spark-operator/pull/1401) by [@sairamankumar2](https://github.com/sairamankumar2))
- Add C2FO to who is using ([#1391](https://github.com/kubeflow/spark-operator/pull/1391) by [@vanhoale](https://github.com/vanhoale))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.13...spark-operator-chart-1.1.14)
## [spark-operator-chart-1.1.13](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.13) (2021-11-18)
- delete-service-accounts-and-roles-before-creation ([#1384](https://github.com/kubeflow/spark-operator/pull/1384) by [@TiansuYu](https://github.com/TiansuYu))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.12...spark-operator-chart-1.1.13)
## [spark-operator-chart-1.1.12](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.12) (2021-11-14)
- webhook timeout variable ([#1387](https://github.com/kubeflow/spark-operator/pull/1387) by [@sairamankumar2](https://github.com/sairamankumar2))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.11...spark-operator-chart-1.1.12)
## [spark-operator-chart-1.1.11](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.11) (2021-11-12)
- [FIX] add service account access to persistentvolumeclaims ([#1390](https://github.com/kubeflow/spark-operator/pull/1390) by [@mschroering](https://github.com/mschroering))
- Add DeepCure to who is using ([#1389](https://github.com/kubeflow/spark-operator/pull/1389) by [@mschroering](https://github.com/mschroering))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.10...spark-operator-chart-1.1.11)
## [spark-operator-chart-1.1.10](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.10) (2021-11-09)
- Add custom toleration support for webhook jobs ([#1383](https://github.com/kubeflow/spark-operator/pull/1383) by [@korjek](https://github.com/korjek))
- fix container name in addsecuritycontext patch ([#1377](https://github.com/kubeflow/spark-operator/pull/1377) by [@lybavsky](https://github.com/lybavsky))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.9...spark-operator-chart-1.1.10)
## [spark-operator-chart-1.1.9](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.9) (2021-11-01)
- `Role` and `RoleBinding` not installed for `webhook-init` in Helm `pre-hook` ([#1379](https://github.com/kubeflow/spark-operator/pull/1379) by [@zzvara](https://github.com/zzvara))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.8...spark-operator-chart-1.1.9)
## [spark-operator-chart-1.1.8](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.8) (2021-10-26)
- Regenerate deleted cert after upgrade ([#1373](https://github.com/kubeflow/spark-operator/pull/1373) by [@simplylizz](https://github.com/simplylizz))
- Make manifests usable by Kustomize ([#1367](https://github.com/kubeflow/spark-operator/pull/1367) by [@karpoftea](https://github.com/karpoftea))
- #1329 update the operator to allow subpaths to be used with the spark ui ingress. ([#1330](https://github.com/kubeflow/spark-operator/pull/1330) by [@TomHellier](https://github.com/TomHellier))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.7...spark-operator-chart-1.1.8)
## [spark-operator-chart-1.1.7](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.7) (2021-10-21)
- serviceAccount annotations ([#1350](https://github.com/kubeflow/spark-operator/pull/1350) by [@moskitone](https://github.com/moskitone))
- Update Dockerfile ([#1369](https://github.com/kubeflow/spark-operator/pull/1369) by [@Sadagopan88](https://github.com/Sadagopan88))
- [FIX] tolerations are not directly present in Driver(/Executor)Spec ([#1365](https://github.com/kubeflow/spark-operator/pull/1365) by [@s-pedamallu](https://github.com/s-pedamallu))
- fix running metrics for application deletion ([#1358](https://github.com/kubeflow/spark-operator/pull/1358) by [@Aakcht](https://github.com/Aakcht))
- Update who-is-using.md ([#1338](https://github.com/kubeflow/spark-operator/pull/1338) by [@Juandavi1](https://github.com/Juandavi1))
- Update who-is-using.md ([#1082](https://github.com/kubeflow/spark-operator/pull/1082) by [@Juandavi1](https://github.com/Juandavi1))
- Add support for executor service account ([#1322](https://github.com/kubeflow/spark-operator/pull/1322) by [@bbenzikry](https://github.com/bbenzikry))
- fix NPE introduce on #1280 ([#1325](https://github.com/kubeflow/spark-operator/pull/1325) by [@ImpSy](https://github.com/ImpSy))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.6...spark-operator-chart-1.1.7)
## [spark-operator-chart-1.1.6](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.6) (2021-08-04)
- Add hook deletion policy for spark-operator service account ([#1313](https://github.com/kubeflow/spark-operator/pull/1313) by [@pdrastil](https://github.com/pdrastil))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.5...spark-operator-chart-1.1.6)
## [spark-operator-chart-1.1.5](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.5) (2021-07-28)
- Add user defined pod labels ([#1288](https://github.com/kubeflow/spark-operator/pull/1288) by [@pdrastil](https://github.com/pdrastil))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.4...spark-operator-chart-1.1.5)
## [spark-operator-chart-1.1.4](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.4) (2021-07-25)
- Migrate CRDs from v1beta1 to v1. Add additionalPrinterColumns ([#1298](https://github.com/kubeflow/spark-operator/pull/1298) by [@drazul](https://github.com/drazul))
- Explain "signal: kill" errors during submission ([#1292](https://github.com/kubeflow/spark-operator/pull/1292) by [@zzvara](https://github.com/zzvara))
- fix the invalid repo address ([#1291](https://github.com/kubeflow/spark-operator/pull/1291) by [@william-wang](https://github.com/william-wang))
- add failure context to recordExecutorEvent ([#1280](https://github.com/kubeflow/spark-operator/pull/1280) by [@ImpSy](https://github.com/ImpSy))
- Update pythonVersion to fix example ([#1284](https://github.com/kubeflow/spark-operator/pull/1284) by [@stratus](https://github.com/stratus))
- add crds drift check between chart/ and manifest/ ([#1272](https://github.com/kubeflow/spark-operator/pull/1272) by [@ImpSy](https://github.com/ImpSy))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.3...spark-operator-chart-1.1.4)
## [spark-operator-chart-1.1.3](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.3) (2021-05-25)
- Allow user to specify service annotation on Spark UI service ([#1264](https://github.com/kubeflow/spark-operator/pull/1264) by [@khorshuheng](https://github.com/khorshuheng))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.2...spark-operator-chart-1.1.3)
## [spark-operator-chart-1.1.2](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.2) (2021-05-25)
- implement shareProcessNamespace in SparkPodSpec ([#1262](https://github.com/kubeflow/spark-operator/pull/1262) by [@ImpSy](https://github.com/ImpSy))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.1...spark-operator-chart-1.1.2)
## [spark-operator-chart-1.1.1](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.1) (2021-05-19)
- Enable UI service flag for disabling UI service ([#1261](https://github.com/kubeflow/spark-operator/pull/1261) by [@sairamankumar2](https://github.com/sairamankumar2))
- Add DiDi to who-is-using.md ([#1255](https://github.com/kubeflow/spark-operator/pull/1255) by [@Run-Lin](https://github.com/Run-Lin))
- doc: update who is using page ([#1251](https://github.com/kubeflow/spark-operator/pull/1251) by [@luizm](https://github.com/luizm))
- Add Tongdun under who-is-using ([#1249](https://github.com/kubeflow/spark-operator/pull/1249) by [@lomoJG](https://github.com/lomoJG))
- [#1239] Custom service port name for spark application UI ([#1240](https://github.com/kubeflow/spark-operator/pull/1240) by [@marcozov](https://github.com/marcozov))
- fix: do not remove preemptionPolicy in patcher when not present ([#1246](https://github.com/kubeflow/spark-operator/pull/1246) by [@HHK1](https://github.com/HHK1))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.1.0...spark-operator-chart-1.1.1)
## [spark-operator-chart-1.1.0](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.1.0) (2021-04-28)
- Updating Spark version from 3.0 to 3.1.1 ([#1153](https://github.com/kubeflow/spark-operator/pull/1153) by [@chethanuk](https://github.com/chethanuk))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.10...spark-operator-chart-1.1.0)
## [spark-operator-chart-1.0.10](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.10) (2021-04-28)
- Add support for blue/green deployments ([#1230](https://github.com/kubeflow/spark-operator/pull/1230) by [@flupke](https://github.com/flupke))
- Update who-is-using.md: Fossil is using Spark Operator for Production ([#1244](https://github.com/kubeflow/spark-operator/pull/1244) by [@duyet](https://github.com/duyet))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.9...spark-operator-chart-1.0.10)
## [spark-operator-chart-1.0.9](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.9) (2021-04-23)
- Link to Kubernetes Slack ([#1234](https://github.com/kubeflow/spark-operator/pull/1234) by [@jsoref](https://github.com/jsoref))
- fix: remove preemptionPolicy when priority class name is used ([#1236](https://github.com/kubeflow/spark-operator/pull/1236) by [@HHK1](https://github.com/HHK1))
- Spelling ([#1231](https://github.com/kubeflow/spark-operator/pull/1231) by [@jsoref](https://github.com/jsoref))
- Add support to expose custom ports ([#1205](https://github.com/kubeflow/spark-operator/pull/1205) by [@luizm](https://github.com/luizm))
- Fix the error of hostAliases when there are more than 2 hostnames ([#1209](https://github.com/kubeflow/spark-operator/pull/1209) by [@cdmikechen](https://github.com/cdmikechen))
- remove multiple prefixes for 'p' ([#1210](https://github.com/kubeflow/spark-operator/pull/1210) by [@chaudhryfaisal](https://github.com/chaudhryfaisal))
- added --s3-force-path-style to force path style URLs for S3 objects ([#1206](https://github.com/kubeflow/spark-operator/pull/1206) by [@chaudhryfaisal](https://github.com/chaudhryfaisal))
- Allow custom bucket path ([#1207](https://github.com/kubeflow/spark-operator/pull/1207) by [@bribroder](https://github.com/bribroder))
- fix: Remove priority from the spec when using priority class ([#1203](https://github.com/kubeflow/spark-operator/pull/1203) by [@HHK1](https://github.com/HHK1))
- Fix go get issue with "unknown revision v0.0.0" ([#1198](https://github.com/kubeflow/spark-operator/pull/1198) by [@hongshaoyang](https://github.com/hongshaoyang))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.8...spark-operator-chart-1.0.9)
## [spark-operator-chart-1.0.8](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.8) (2021-03-07)
- Helm: Put service account into pre-install hook. ([#1155](https://github.com/kubeflow/spark-operator/pull/1155) by [@tandrup](https://github.com/tandrup))
- correct hook annotation for webhook job ([#1193](https://github.com/kubeflow/spark-operator/pull/1193) by [@chaudhryfaisal](https://github.com/chaudhryfaisal))
- Update who-is-using.md ([#1174](https://github.com/kubeflow/spark-operator/pull/1174) by [@tarek-izemrane](https://github.com/tarek-izemrane))
- add Carrefour as adopter and contributor ([#1156](https://github.com/kubeflow/spark-operator/pull/1156) by [@AliGouta](https://github.com/AliGouta))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.7...spark-operator-chart-1.0.8)
## [spark-operator-chart-1.0.7](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.7) (2021-02-05)
- fix issue #1131 ([#1142](https://github.com/kubeflow/spark-operator/pull/1142) by [@kz33](https://github.com/kz33))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.6...spark-operator-chart-1.0.7)
## [spark-operator-chart-1.0.6](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.6) (2021-02-04)
- Add Fossil to who-is-using.md ([#1152](https://github.com/kubeflow/spark-operator/pull/1152) by [@duyet](https://github.com/duyet))
- #1143 Helm issues while deploying using argocd ([#1145](https://github.com/kubeflow/spark-operator/pull/1145) by [@TomHellier](https://github.com/TomHellier))
- Include Gojek in who-is-using.md ([#1146](https://github.com/kubeflow/spark-operator/pull/1146) by [@pradithya](https://github.com/pradithya))
- add hostAliases for SparkPodSpec ([#1133](https://github.com/kubeflow/spark-operator/pull/1133) by [@ImpSy](https://github.com/ImpSy))
- Adding MavenCode ([#1128](https://github.com/kubeflow/spark-operator/pull/1128) by [@charlesa101](https://github.com/charlesa101))
- Add MongoDB to who-is-using.md ([#1123](https://github.com/kubeflow/spark-operator/pull/1123) by [@chickenPopcorn](https://github.com/chickenPopcorn))
- update go version to 1.15 and k8s deps to v0.19.6 ([#1119](https://github.com/kubeflow/spark-operator/pull/1119) by [@stpabhi](https://github.com/stpabhi))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.5...spark-operator-chart-1.0.6)
## [spark-operator-chart-1.0.5](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.5) (2020-12-15)
- Add prometheus containr port name ([#1099](https://github.com/kubeflow/spark-operator/pull/1099) by [@nicholas-fwang](https://github.com/nicholas-fwang))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.4...spark-operator-chart-1.0.5)
## [spark-operator-chart-1.0.4](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.4) (2020-12-12)
- Upgrade the Chart version to 1.0.4 ([#1113](https://github.com/kubeflow/spark-operator/pull/1113) by [@ordukhanian](https://github.com/ordukhanian))
- Support Prometheus PodMonitor Deployment (#1106) ([#1112](https://github.com/kubeflow/spark-operator/pull/1112) by [@ordukhanian](https://github.com/ordukhanian))
- update executor status if pod is lost while app is still running ([#1111](https://github.com/kubeflow/spark-operator/pull/1111) by [@ImpSy](https://github.com/ImpSy))
- Add scheduler func for clearing batch scheduling on completed ([#1079](https://github.com/kubeflow/spark-operator/pull/1079) by [@nicholas-fwang](https://github.com/nicholas-fwang))
- Add configuration for SparkUI service type ([#1100](https://github.com/kubeflow/spark-operator/pull/1100) by [@jutley](https://github.com/jutley))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.3...spark-operator-chart-1.0.4)
## [spark-operator-chart-1.0.3](https://github.com/kubeflow/spark-operator/tree/spark-operator-chart-1.0.3) (2020-12-07)
- Update docs with new helm instructions ([#1105](https://github.com/kubeflow/spark-operator/pull/1105) by [@hagaibarel](https://github.com/hagaibarel))
[Full Changelog](https://github.com/kubeflow/spark-operator/compare/spark-operator-chart-1.0.2...spark-operator-chart-1.0.3)

View File

@ -1,3 +0,0 @@
# Code of Conduct
For the code of conduct, please refer to the [Kubeflow Community Code of Conduct](https://www.kubeflow.org/docs/about/contributing/#follow-the-code-of-conduct).

View File

@ -1,11 +1,23 @@
# Contributing to Kubeflow Spark Operator
# How to Contribute
Welcome to the Kubeflow Spark Operator project. We'd love to accept your patches and contributions to this project. For detailed information about how to contribute to Kubeflow, please refer to [Contributing to Kubeflow](https://www.kubeflow.org/docs/about/contributing/).
We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.
## Developer Guide
## Contributor License Agreement
For how to develope with spark operator, please refer to [Developer Guide](https://www.kubeflow.org/docs/components/spark-operator/developer-guide/).
Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution;
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to <https://cla.developers.google.com/> to see
your current agreements on file or to sign a new one.
## Code Reviews
You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.
All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more information on using pull requests.
## Code reviews
All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.

View File

@ -14,47 +14,34 @@
# limitations under the License.
#
ARG SPARK_IMAGE=docker.io/library/spark:4.0.0
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v3.1.1
FROM golang:1.24.1 AS builder
FROM golang:1.15.2-alpine as builder
WORKDIR /workspace
RUN --mount=type=cache,target=/go/pkg/mod/ \
--mount=type=bind,source=go.mod,target=go.mod \
--mount=type=bind,source=go.sum,target=go.sum \
go mod download
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
# Cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download
COPY . .
# Copy the go source code
COPY main.go main.go
COPY pkg/ pkg/
ENV GOCACHE=/root/.cache/go-build
ARG TARGETARCH
RUN --mount=type=cache,target=/go/pkg/mod/ \
--mount=type=cache,target="/root/.cache/go-build" \
CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} GO111MODULE=on make build-operator
# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o /usr/bin/spark-operator main.go
FROM ${SPARK_IMAGE}
ARG SPARK_UID=185
ARG SPARK_GID=185
USER root
RUN apt-get update \
&& apt-get install -y tini \
COPY --from=builder /usr/bin/spark-operator /usr/bin/
RUN apt-get update --allow-releaseinfo-change \
&& apt-get update \
&& apt-get install -y openssl curl tini \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /etc/k8s-webhook-server/serving-certs /home/spark && \
chmod -R g+rw /etc/k8s-webhook-server/serving-certs && \
chown -R spark /etc/k8s-webhook-server/serving-certs /home/spark
USER ${SPARK_UID}:${SPARK_GID}
COPY --from=builder /workspace/bin/spark-operator /usr/bin/spark-operator
COPY hack/gencerts.sh /usr/bin/
COPY entrypoint.sh /usr/bin/
ENTRYPOINT ["/usr/bin/entrypoint.sh"]

57
Dockerfile.rh Normal file
View File

@ -0,0 +1,57 @@
# syntax=docker/dockerfile:1.0-experimental
#
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Build an OpenShift image.
# Before running docker build, make sure
# 1. Your Docker version is >= 18.09.3
# 2. export DOCKER_BUILDKIT=1
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v3.1.1
FROM golang:1.14.0-alpine as builder
WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
# Cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download
# Copy the go source code
COPY main.go main.go
COPY pkg/ pkg/
# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o /usr/bin/spark-operator main.go
FROM ${SPARK_IMAGE}
COPY --from=builder /usr/bin/spark-operator /usr/bin/
USER root
# Comment out the following three lines if you do not have a RedHat subscription.
COPY hack/install_packages.sh /
RUN --mount=target=/opt/spark/credentials,type=secret,id=credentials,required /install_packages.sh
RUN rm /install_packages.sh
RUN chmod -R u+x /tmp
COPY hack/gencerts.sh /usr/bin/
COPY entrypoint.sh /usr/bin/
USER 185
ENTRYPOINT ["/usr/bin/entrypoint.sh"]

384
Makefile
View File

@ -1,347 +1,73 @@
.SILENT:
.PHONY: clean-sparkctl
# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
ifeq (,$(shell go env GOBIN))
GOBIN=$(shell go env GOPATH)/bin
else
GOBIN=$(shell go env GOBIN)
endif
SPARK_OPERATOR_GOPATH=/go/src/github.com/GoogleCloudPlatform/spark-on-k8s-operator
DEP_VERSION:=`grep DEP_VERSION= Dockerfile | awk -F\" '{print $$2}'`
BUILDER=`grep "FROM golang:" Dockerfile | awk '{print $$2}'`
UNAME:=`uname | tr '[:upper:]' '[:lower:]'`
REPO=github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg
# Setting SHELL to bash allows bash commands to be executed by recipes.
# Options are set to exit when a recipe line exits non-zero or a piped command fails.
SHELL = /usr/bin/env bash -o pipefail
.SHELLFLAGS = -ec
all: clean-sparkctl build-sparkctl install-sparkctl
# Version information.
VERSION ?= $(shell cat VERSION | sed "s/^v//")
BUILD_DATE := $(shell date -u +"%Y-%m-%dT%H:%M:%S%:z")
GIT_COMMIT := $(shell git rev-parse HEAD)
GIT_TAG := $(shell if [ -z "`git status --porcelain`" ]; then git describe --exact-match --tags HEAD 2>/dev/null; fi)
GIT_TREE_STATE := $(shell if [ -z "`git status --porcelain`" ]; then echo "clean" ; else echo "dirty"; fi)
GIT_SHA := $(shell git rev-parse --short HEAD || echo "HEAD")
GIT_VERSION := ${VERSION}+${GIT_SHA}
build-sparkctl:
[ ! -f "sparkctl/sparkctl-darwin-amd64" ] || [ ! -f "sparkctl/sparkctl-linux-amd64" ] && \
echo building using $(BUILDER) && \
docker run -it -w $(SPARK_OPERATOR_GOPATH) \
-v $$(pwd):$(SPARK_OPERATOR_GOPATH) $(BUILDER) sh -c \
"apk add --no-cache bash git && \
cd sparkctl && \
./build.sh" || true
MODULE_PATH := $(shell awk '/^module/{print $$2; exit}' go.mod)
SPARK_OPERATOR_GOPATH := /go/src/github.com/kubeflow/spark-operator
SPARK_OPERATOR_CHART_PATH := charts/spark-operator-chart
DEP_VERSION := `grep DEP_VERSION= Dockerfile | awk -F\" '{print $$2}'`
BUILDER := `grep "FROM golang:" Dockerfile | awk '{print $$2}'`
UNAME := `uname | tr '[:upper:]' '[:lower:]'`
clean-sparkctl:
rm -f sparkctl/sparkctl-darwin-amd64 sparkctl/sparkctl-linux-amd64
# CONTAINER_TOOL defines the container tool to be used for building images.
# Be aware that the target commands are only tested with Docker which is
# scaffolded by default. However, you might want to replace it to use other
# tools. (i.e. podman)
CONTAINER_TOOL ?= docker
# Image URL to use all building/pushing image targets
IMAGE_REGISTRY ?= ghcr.io
IMAGE_REPOSITORY ?= kubeflow/spark-operator/controller
IMAGE_TAG ?= $(VERSION)
IMAGE ?= $(IMAGE_REGISTRY)/$(IMAGE_REPOSITORY):$(IMAGE_TAG)
# Kind cluster
KIND_CLUSTER_NAME ?= spark-operator
KIND_CONFIG_FILE ?= charts/spark-operator-chart/ci/kind-config.yaml
KIND_KUBE_CONFIG ?= $(HOME)/.kube/config
## Location to install binaries
LOCALBIN ?= $(shell pwd)/bin
## Versions
KUSTOMIZE_VERSION ?= v5.4.1
CONTROLLER_TOOLS_VERSION ?= v0.17.1
KIND_VERSION ?= v0.23.0
KIND_K8S_VERSION ?= v1.32.0
ENVTEST_VERSION ?= release-0.20
# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
ENVTEST_K8S_VERSION ?= 1.32.0
GOLANGCI_LINT_VERSION ?= v2.1.6
GEN_CRD_API_REFERENCE_DOCS_VERSION ?= v0.3.0
HELM_VERSION ?= v3.15.3
HELM_UNITTEST_VERSION ?= 0.5.1
HELM_DOCS_VERSION ?= v1.14.2
CODE_GENERATOR_VERSION ?= v0.33.1
## Binaries
SPARK_OPERATOR ?= $(LOCALBIN)/spark-operator
KUBECTL ?= kubectl
KUSTOMIZE ?= $(LOCALBIN)/kustomize-$(KUSTOMIZE_VERSION)
CONTROLLER_GEN ?= $(LOCALBIN)/controller-gen-$(CONTROLLER_TOOLS_VERSION)
KIND ?= $(LOCALBIN)/kind-$(KIND_VERSION)
ENVTEST ?= $(LOCALBIN)/setup-envtest-$(ENVTEST_VERSION)
GOLANGCI_LINT ?= $(LOCALBIN)/golangci-lint-$(GOLANGCI_LINT_VERSION)
GEN_CRD_API_REFERENCE_DOCS ?= $(LOCALBIN)/gen-crd-api-reference-docs-$(GEN_CRD_API_REFERENCE_DOCS_VERSION)
HELM ?= $(LOCALBIN)/helm-$(HELM_VERSION)
HELM_DOCS ?= $(LOCALBIN)/helm-docs-$(HELM_DOCS_VERSION)
##@ General
# The help target prints out all targets with their descriptions organized
# beneath their categories. The categories are represented by '##@' and the
# target descriptions by '##'. The awk command is responsible for reading the
# entire set of makefiles included in this invocation, looking for lines of the
# file as xyz: ## something, and then pretty-format the target and help. Then,
# if there's a line with ##@ something, that gets pretty-printed as a category.
# More info on the usage of ANSI control characters for terminal formatting:
# https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_parameters
# More info on the awk command:
# http://linuxcommand.org/lc3_adv_awk.php
.PHONY: help
help: ## Display this help.
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m<target>\033[0m\n"} /^[a-zA-Z_0-9-]+:.*?##/ { printf " \033[36m%-30s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)
.PHONY: version
version: ## Print version information.
@echo "Version: ${VERSION}"
@echo "Build Date: ${BUILD_DATE}"
@echo "Git Commit: ${GIT_COMMIT}"
@echo "Git Tag: ${GIT_TAG}"
@echo "Git Tree State: ${GIT_TREE_STATE}"
@echo "Git SHA: ${GIT_SHA}"
@echo "Git Version: ${GIT_VERSION}"
.PHONY: print-%
print-%: ; @echo $*=$($*)
##@ Development
.PHONY: manifests
manifests: controller-gen ## Generate CustomResourceDefinition, RBAC and WebhookConfiguration manifests.
$(CONTROLLER_GEN) crd:generateEmbeddedObjectMeta=true rbac:roleName=spark-operator-controller webhook paths="./..." output:crd:artifacts:config=config/crd/bases
.PHONY: generate
generate: controller-gen ## Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate.go.txt" paths="./..."
.PHONY: update-crd
update-crd: manifests ## Update CRD files in the Helm chart.
cp config/crd/bases/* charts/spark-operator-chart/crds/
.PHONY: verify-codegen
verify-codegen: $(LOCALBIN) ## Install code-generator commands and verify changes
$(call go-install-tool,$(LOCALBIN)/register-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/register-gen,$(CODE_GENERATOR_VERSION))
$(call go-install-tool,$(LOCALBIN)/client-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/client-gen,$(CODE_GENERATOR_VERSION))
$(call go-install-tool,$(LOCALBIN)/lister-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/lister-gen,$(CODE_GENERATOR_VERSION))
$(call go-install-tool,$(LOCALBIN)/informer-gen-$(CODE_GENERATOR_VERSION),k8s.io/code-generator/cmd/informer-gen,$(CODE_GENERATOR_VERSION))
./hack/verify-codegen.sh
.PHONY: go-clean
go-clean: ## Clean up caches and output.
@echo "cleaning up caches and output"
go clean -cache -testcache -r -x 2>&1 >/dev/null
-rm -rf _output
.PHONY: go-fmt
go-fmt: ## Run go fmt against code.
@echo "Running go fmt..."
if [ -n "$(shell go fmt ./...)" ]; then \
echo "Go code is not formatted, need to run \"make go-fmt\" and commit the changes."; \
false; \
install-sparkctl: | sparkctl/sparkctl-darwin-amd64 sparkctl/sparkctl-linux-amd64
@if [ "$(UNAME)" = "linux" ]; then \
echo "installing linux binary to /usr/local/bin/sparkctl"; \
sudo cp sparkctl/sparkctl-linux-amd64 /usr/local/bin/sparkctl; \
sudo chmod +x /usr/local/bin/sparkctl; \
elif [ "$(UNAME)" = "darwin" ]; then \
echo "installing macOS binary to /usr/local/bin/sparkctl"; \
cp sparkctl/sparkctl-darwin-amd64 /usr/local/bin/sparkctl; \
chmod +x /usr/local/bin/sparkctl; \
else \
echo "Go code is formatted."; \
echo "$(UNAME) not supported"; \
fi
.PHONY: go-vet
go-vet: ## Run go vet against code.
@echo "Running go vet..."
go vet ./...
.PHONY: go-lint
go-lint: golangci-lint ## Run golangci-lint linter.
@echo "Running golangci-lint run..."
$(GOLANGCI_LINT) run
.PHONY: go-lint-fix
go-lint-fix: golangci-lint ## Run golangci-lint linter and perform fixes.
@echo "Running golangci-lint run --fix..."
$(GOLANGCI_LINT) run --fix
.PHONY: unit-test
unit-test: envtest ## Run unit tests.
@echo "Running unit tests..."
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)"
go test $(shell go list ./... | grep -v /e2e) -coverprofile cover.out
.PHONY: e2e-test
e2e-test: envtest ## Run the e2e tests against a Kind k8s instance that is spun up.
@echo "Running e2e tests..."
go test ./test/e2e/ -v -ginkgo.v -timeout 30m
##@ Build
override LDFLAGS += \
-X ${MODULE_PATH}.version=${GIT_VERSION} \
-X ${MODULE_PATH}.buildDate=${BUILD_DATE} \
-X ${MODULE_PATH}.gitCommit=${GIT_COMMIT} \
-X ${MODULE_PATH}.gitTreeState=${GIT_TREE_STATE} \
-extldflags "-static"
.PHONY: build-operator
build-operator: ## Build Spark operator.
echo "Building spark-operator binary..."
CGO_ENABLED=0 go build -o $(SPARK_OPERATOR) -ldflags '${LDFLAGS}' cmd/operator/main.go
.PHONY: clean
clean: ## Clean binaries.
rm -f $(SPARK_OPERATOR)
.PHONY: build-api-docs
build-api-docs: gen-crd-api-reference-docs ## Build api documentation.
$(GEN_CRD_API_REFERENCE_DOCS) \
-config hack/api-docs/config.json \
-api-dir github.com/kubeflow/spark-operator/v2/api/v1beta2 \
-template-dir hack/api-docs/template \
build-api-docs:
hack/api-ref-docs \
-config hack/api-docs-config.json \
-api-dir github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/apis/sparkoperator.k8s.io/v1beta2 \
-template-dir hack/api-docs-template \
-out-file docs/api-docs.md
# If you wish to build the operator image targeting other platforms you can use the --platform flag.
# (i.e. docker build --platform linux/arm64). However, you must enable docker buildKit for it.
# More info: https://docs.docker.com/develop/develop-images/build_enhancements/
.PHONY: docker-build
docker-build: ## Build docker image with the operator.
$(CONTAINER_TOOL) build -t ${IMAGE} .
helm-docs:
helm-docs -c ./charts
.PHONY: docker-push
docker-push: ## Push docker image with the operator.
$(CONTAINER_TOOL) push ${IMAGE}
fmt-check: clean
@echo "running fmt check"
./.travis.gofmt.sh
# PLATFORMS defines the target platforms for the operator image be built to provide support to multiple
# architectures. (i.e. make docker-buildx IMG=myregistry/mypoperator:0.0.1). To use this option you need to:
# - be able to use docker buildx. More info: https://docs.docker.com/build/buildx/
# - have enabled BuildKit. More info: https://docs.docker.com/develop/develop-images/build_enhancements/
# - be able to push the image to your registry (i.e. if you do not set a valid value via IMG=<myregistry/image:<tag>> then the export will fail)
# To adequately provide solutions that are compatible with multiple platforms, you should consider using this option.
PLATFORMS ?= linux/amd64,linux/arm64
.PHONY: docker-buildx
docker-buildx: ## Build and push docker image for the operator for cross-platform support
- $(CONTAINER_TOOL) buildx create --name spark-operator-builder
$(CONTAINER_TOOL) buildx use spark-operator-builder
- $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag ${IMAGE} -f Dockerfile .
- $(CONTAINER_TOOL) buildx rm spark-operator-builder
detect-crds-drift:
diff -q charts/spark-operator-chart/crds manifest/crds --exclude=kustomization.yaml
##@ Helm
clean:
@echo "cleaning up caches and output"
go clean -cache -testcache -r -x ./... 2>&1 >/dev/null
-rm -rf _output
.PHONY: detect-crds-drift
detect-crds-drift: manifests ## Detect CRD drift.
diff -q $(SPARK_OPERATOR_CHART_PATH)/crds config/crd/bases
test: clean
@echo "running unit tests"
go test -v ./... -covermode=atomic
.PHONY: helm-unittest
helm-unittest: helm-unittest-plugin ## Run Helm chart unittests.
$(HELM) unittest $(SPARK_OPERATOR_CHART_PATH) --strict --file "tests/**/*_test.yaml"
.PHONY: helm-lint
helm-lint: ## Run Helm chart lint test.
docker run --rm --workdir /workspace --volume "$$(pwd):/workspace" quay.io/helmpack/chart-testing:latest ct lint --target-branch master --validate-maintainers=false
it-test: clean all
@echo "running unit tests"
go test -v ./test/e2e/ --kubeconfig "$HOME/.kube/config" --operator-image=gcr.io/spark-operator/spark-operator:v1beta2-1.2.3-3.1.1
.PHONY: helm-docs
helm-docs: helm-docs-plugin ## Generates markdown documentation for helm charts from requirements and values files.
$(HELM_DOCS) --sort-values-order=file
##@ Deployment
ifndef ignore-not-found
ignore-not-found = false
endif
.PHONY: kind-create-cluster
kind-create-cluster: kind ## Create a kind cluster for integration tests.
if ! $(KIND) get clusters 2>/dev/null | grep -q "^$(KIND_CLUSTER_NAME)$$"; then \
$(KIND) create cluster \
--name $(KIND_CLUSTER_NAME) \
--config $(KIND_CONFIG_FILE) \
--image kindest/node:$(KIND_K8S_VERSION) \
--kubeconfig $(KIND_KUBE_CONFIG) \
--wait=1m; \
fi
.PHONY: kind-load-image
kind-load-image: kind-create-cluster docker-build ## Load the image into the kind cluster.
$(KIND) load docker-image --name $(KIND_CLUSTER_NAME) $(IMAGE)
.PHONY: kind-delete-cluster
kind-delete-cluster: kind ## Delete the created kind cluster.
$(KIND) delete cluster --name $(KIND_CLUSTER_NAME) --kubeconfig $(KIND_KUBE_CONFIG)
.PHONY: install
install-crd: manifests kustomize ## Install CRDs into the K8s cluster specified in ~/.kube/config.
$(KUSTOMIZE) build config/crd | $(KUBECTL) create -f - 2>/dev/null || $(KUSTOMIZE) build config/crd | $(KUBECTL) replace -f -
.PHONY: uninstall
uninstall-crd: manifests kustomize ## Uninstall CRDs from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
$(KUSTOMIZE) build config/crd | $(KUBECTL) delete --ignore-not-found=$(ignore-not-found) -f -
.PHONY: deploy
deploy: IMAGE_TAG=local
deploy: helm manifests update-crd kind-load-image ## Deploy controller to the K8s cluster specified in ~/.kube/config.
$(HELM) upgrade --install -f charts/spark-operator-chart/ci/ci-values.yaml spark-operator ./charts/spark-operator-chart/
.PHONY: undeploy
undeploy: helm ## Uninstall spark-operator
$(HELM) uninstall spark-operator
##@ Dependencies
$(LOCALBIN):
mkdir -p $(LOCALBIN)
.PHONY: kustomize
kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
$(KUSTOMIZE): $(LOCALBIN)
$(call go-install-tool,$(KUSTOMIZE),sigs.k8s.io/kustomize/kustomize/v5,$(KUSTOMIZE_VERSION))
.PHONY: controller-gen
controller-gen: $(CONTROLLER_GEN) ## Download controller-gen locally if necessary.
$(CONTROLLER_GEN): $(LOCALBIN)
$(call go-install-tool,$(CONTROLLER_GEN),sigs.k8s.io/controller-tools/cmd/controller-gen,$(CONTROLLER_TOOLS_VERSION))
.PHONY: kind
kind: $(KIND) ## Download kind locally if necessary.
$(KIND): $(LOCALBIN)
$(call go-install-tool,$(KIND),sigs.k8s.io/kind,$(KIND_VERSION))
.PHONY: envtest
envtest: $(ENVTEST) ## Download setup-envtest locally if necessary.
$(ENVTEST): $(LOCALBIN)
$(call go-install-tool,$(ENVTEST),sigs.k8s.io/controller-runtime/tools/setup-envtest,$(ENVTEST_VERSION))
.PHONY: golangci-lint
golangci-lint: $(GOLANGCI_LINT) ## Download golangci-lint locally if necessary.
$(GOLANGCI_LINT): $(LOCALBIN)
$(call go-install-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/v2/cmd/golangci-lint,${GOLANGCI_LINT_VERSION})
.PHONY: gen-crd-api-reference-docs
gen-crd-api-reference-docs: $(GEN_CRD_API_REFERENCE_DOCS) ## Download gen-crd-api-reference-docs locally if necessary.
$(GEN_CRD_API_REFERENCE_DOCS): $(LOCALBIN)
$(call go-install-tool,$(GEN_CRD_API_REFERENCE_DOCS),github.com/ahmetb/gen-crd-api-reference-docs,$(GEN_CRD_API_REFERENCE_DOCS_VERSION))
.PHONY: helm
helm: $(HELM) ## Download helm locally if necessary.
$(HELM): $(LOCALBIN)
$(call go-install-tool,$(HELM),helm.sh/helm/v3/cmd/helm,$(HELM_VERSION))
.PHONY: helm-unittest-plugin
helm-unittest-plugin: helm ## Download helm unittest plugin locally if necessary.
if [ -z "$(shell $(HELM) plugin list | grep unittest)" ]; then \
echo "Installing helm unittest plugin"; \
$(HELM) plugin install https://github.com/helm-unittest/helm-unittest.git --version $(HELM_UNITTEST_VERSION); \
fi
.PHONY: helm-docs-plugin
helm-docs-plugin: $(HELM_DOCS) ## Download helm-docs plugin locally if necessary.
$(HELM_DOCS): $(LOCALBIN)
$(call go-install-tool,$(HELM_DOCS),github.com/norwoodj/helm-docs/cmd/helm-docs,$(HELM_DOCS_VERSION))
# go-install-tool will 'go install' any package with custom target and name of binary, if it doesn't exist
# $1 - target path with name of binary (ideally with version)
# $2 - package url which can be installed
# $3 - specific version of package
define go-install-tool
@[ -f $(1) ] || { \
set -e; \
package=$(2)@$(3) ;\
echo "Downloading $${package}" ;\
GOBIN=$(LOCALBIN) go install $${package} ;\
mv "$$(echo "$(1)" | sed "s/-$(3)$$//")" $(1) ;\
}
endef
vet:
@echo "running go vet"
# echo "Building using $(BUILDER)"
# go vet ./...
go vet $(REPO)...

10
OWNERS
View File

@ -1,10 +0,0 @@
approvers:
- andreyvelich
- ChenYi015
- jacobsalway
- mwielgus
- vara-bonthu
- yuchaoran2011
reviewers:
- ImpSy
- nabuskey

39
PROJECT
View File

@ -1,39 +0,0 @@
# Code generated by tool. DO NOT EDIT.
# This file is used to track the info used to scaffold your project
# and allow the plugins properly work.
# More info: https://book.kubebuilder.io/reference/project-config.html
domain: sparkoperator.k8s.io
layout:
- go.kubebuilder.io/v4
projectName: spark-operator
repo: github.com/kubeflow/spark-operator
resources:
- api:
crdVersion: v1
namespaced: true
controller: true
domain: sparkoperator.k8s.io
kind: SparkConnect
path: github.com/kubeflow/spark-operator/api/v1alpha1
version: v1alpha1
- api:
crdVersion: v1
namespaced: true
controller: true
domain: sparkoperator.k8s.io
kind: SparkApplication
path: github.com/kubeflow/spark-operator/api/v1beta2
version: v1beta2
webhooks:
defaulting: true
validation: true
webhookVersion: v1
- api:
crdVersion: v1
namespaced: true
controller: true
domain: sparkoperator.k8s.io
kind: ScheduledSparkApplication
path: github.com/kubeflow/spark-operator/api/v1beta2
version: v1beta2
version: "3"

154
README.md
View File

@ -1,42 +1,78 @@
# Kubeflow Spark Operator
[![Build Status](https://travis-ci.org/GoogleCloudPlatform/spark-on-k8s-operator.svg?branch=master)](https://travis-ci.org/GoogleCloudPlatform/spark-on-k8s-operator.svg?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/GoogleCloudPlatform/spark-on-k8s-operator)](https://goreportcard.com/report/github.com/GoogleCloudPlatform/spark-on-k8s-operator)
[![Integration Test](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml/badge.svg)](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml)
[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/spark-operator)](https://goreportcard.com/report/github.com/kubeflow/spark-operator)
[![GitHub release](https://img.shields.io/github/v/release/kubeflow/spark-operator)](https://github.com/kubeflow/spark-operator/releases)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/10524/badge)](https://www.bestpractices.dev/projects/10524)
**This is not an officially supported Google product.**
## What is Spark Operator?
## Community
The Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses
[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for specifying, running, and surfacing status of Spark applications.
* Join our [Slack](https://kubernetes.slack.com/messages/CALBDHMTL) channel on [Kubernetes on Slack](https://slack.k8s.io/).
* Check out [who is using the Kubernetes Operator for Apache Spark](docs/who-is-using.md).
## Quick Start
## Project Status
For a more detailed guide, please refer to the [Getting Started guide](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).
**Project status:** *beta*
**Current API version:** *`v1beta2`*
**If you are currently using the `v1beta1` version of the APIs in your manifests, please update them to use the `v1beta2` version by changing `apiVersion: "sparkoperator.k8s.io/<version>"` to `apiVersion: "sparkoperator.k8s.io/v1beta2"`. You will also need to delete the `previous` version of the CustomResourceDefinitions named `sparkapplications.sparkoperator.k8s.io` and `scheduledsparkapplications.sparkoperator.k8s.io`, and replace them with the `v1beta2` version either by installing the latest version of the operator or by running `kubectl create -f manifest/crds`.**
Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod affinity, is implemented using a Kubernetes [Mutating Admission Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/), which became beta in Kubernetes 1.9. The mutating admission webhook is disabled by default if you install the operator using the Helm [chart](charts/spark-operator-chart). Check out the [Quick Start Guide](docs/quick-start-guide.md#using-the-mutating-admission-webhook) on how to enable the webhook.
## Prerequisites
* Version >= 1.13 of Kubernetes to use the [`subresource` support for CustomResourceDefinitions](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#subresources), which became beta in 1.13 and is enabled by default in 1.13 and higher.
## Installation
The easiest way to install the Kubernetes Operator for Apache Spark is to use the Helm [chart](charts/spark-operator-chart/).
```bash
# Add the Helm repository
helm repo add --force-update spark-operator https://kubeflow.github.io/spark-operator
$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace \
--wait
# Create an example application in the default namespace
kubectl apply -f https://raw.githubusercontent.com/kubeflow/spark-operator/refs/heads/master/examples/spark-pi.yaml
# Get the status of the application
kubectl get sparkapp spark-pi
# Delete the application
kubectl delete sparkapp spark-pi
$ helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace
```
This will install the Kubernetes Operator for Apache Spark into the namespace `spark-operator`. The operator by default watches and handles `SparkApplication`s in every namespaces. If you would like to limit the operator to watch and handle `SparkApplication`s in a single namespace, e.g., `default` instead, add the following option to the `helm install` command:
```
--set sparkJobNamespace=default
```
For configuration options available in the Helm chart, please refer to the chart's [README](charts/spark-operator-chart/README.md).
## Version Matrix
The following table lists the most recent few versions of the operator.
| Operator Version | API Version | Kubernetes Version | Base Spark Version | Operator Image Tag |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| `latest` (master HEAD) | `v1beta2` | 1.13+ | `3.0.0` | `latest` |
| `v1beta2-1.2.3-3.1.1` | `v1beta2` | 1.13+ | `3.1.1` | `v1beta2-1.2.3-3.1.1` |
| `v1beta2-1.2.0-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` | `v1beta2-1.2.0-3.0.0` |
| `v1beta2-1.1.2-2.4.5` | `v1beta2` | 1.13+ | `2.4.5` | `v1beta2-1.1.2-2.4.5` |
| `v1beta2-1.0.1-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` | `v1beta2-1.0.1-2.4.4` |
| `v1beta2-1.0.0-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` | `v1beta2-1.0.0-2.4.4` |
| `v1beta1-0.9.0` | `v1beta1` | 1.13+ | `2.4.0` | `v2.4.0-v1beta1-0.9.0` |
When installing using the Helm chart, you can choose to use a specific image tag instead of the default one, using the following option:
```
--set image.tag=<operator image tag>
```
## Get Started
Get started quickly with the Kubernetes Operator for Apache Spark using the [Quick Start Guide](docs/quick-start-guide.md).
If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](docs/gcp.md).
For more information, check the [Design](docs/design.md), [API Specification](docs/api-docs.md) and detailed [User Guide](docs/user-guide.md).
## Overview
For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [Architecture](https://www.kubeflow.org/docs/components/spark-operator/overview/#architecture). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
The Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses
[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [design doc](docs/design.md). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
The Kubernetes Operator for Apache Spark currently supports the following list of features:
@ -48,66 +84,10 @@ The Kubernetes Operator for Apache Spark currently supports the following list o
* Supports automatic application re-submission for updated `SparkApplication` objects with updated specification.
* Supports automatic application restart with a configurable restart policy.
* Supports automatic retries of failed submissions with optional linear back-off.
* Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via `sparkctl`.
* Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via `sparkctl`.
* Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.
## Project Status
## Contributing
**Project status:** *beta*
**Current API version:** *`v1beta2`*
**If you are currently using the `v1beta1` version of the APIs in your manifests, please update them to use the `v1beta2` version by changing `apiVersion: "sparkoperator.k8s.io/<version>"` to `apiVersion: "sparkoperator.k8s.io/v1beta2"`. You will also need to delete the `previous` version of the CustomResourceDefinitions named `sparkapplications.sparkoperator.k8s.io` and `scheduledsparkapplications.sparkoperator.k8s.io`, and replace them with the `v1beta2` version either by installing the latest version of the operator or by running `kubectl create -f config/crd/bases`.**
## Prerequisites
* Version >= 1.13 of Kubernetes to use the [`subresource` support for CustomResourceDefinitions](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#subresources), which became beta in 1.13 and is enabled by default in 1.13 and higher.
* Version >= 1.16 of Kubernetes to use the `MutatingWebhook` and `ValidatingWebhook` of `apiVersion: admissionregistration.k8s.io/v1`.
## Getting Started
For getting started with Spark operator, please refer to [Getting Started](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).
## User Guide
For detailed user guide and API documentation, please refer to [User Guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/) and [API Specification](docs/api-docs.md).
If you are running Spark operator on Google Kubernetes Engine (GKE) and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/gcp/).
## Version Matrix
The following table lists the most recent few versions of the operator.
| Operator Version | API Version | Kubernetes Version | Base Spark Version |
|-----------------------|-------------|--------------------|--------------------|
| `v2.2.x` | `v1beta2` | 1.16+ | `3.5.5` |
| `v2.1.x` | `v1beta2` | 1.16+ | `3.5.3` |
| `v2.0.x` | `v1beta2` | 1.16+ | `3.5.2` |
| `v1beta2-1.6.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
| `v1beta2-1.5.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
| `v1beta2-1.4.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
| `v1beta2-1.3.x-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` |
| `v1beta2-1.2.3-3.1.1` | `v1beta2` | 1.13+ | `3.1.1` |
| `v1beta2-1.2.2-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
| `v1beta2-1.2.1-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
| `v1beta2-1.2.0-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
| `v1beta2-1.1.x-2.4.5` | `v1beta2` | 1.13+ | `2.4.5` |
| `v1beta2-1.0.x-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` |
## Developer Guide
For developing with Spark Operator, please refer to [Developer Guide](https://www.kubeflow.org/docs/components/spark-operator/developer-guide/).
## Contributor Guide
For contributing to Spark Operator, please refer to [Contributor Guide](CONTRIBUTING.md).
## Community
* Join the [CNCF Slack Channel](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) and then join `#kubeflow-spark-operator` Channel.
* Check out our blog post [Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community](https://blog.kubeflow.org/operators/2024/04/15/kubeflow-spark-operator.html).
* Join our monthly community meeting [Kubeflow Spark Operator Meeting Notes](https://bit.ly/3VGzP4n).
## Adopters
Check out [adopters of Spark Operator](ADOPTERS.md).
Please check [CONTRIBUTING.md](CONTRIBUTING.md) and the [Developer Guide](docs/developer-guide.md) out.

View File

@ -1 +0,0 @@
v2.2.1

View File

@ -1,82 +0,0 @@
/*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1alpha1
// DeployMode describes the type of deployment of a Spark application.
type DeployMode string
// Different types of deployments.
const (
DeployModeCluster DeployMode = "cluster"
DeployModeClient DeployMode = "client"
)
// DriverState tells the current state of a spark driver.
type DriverState string
// Different states a spark driver may have.
const (
DriverStatePending DriverState = "PENDING"
DriverStateRunning DriverState = "RUNNING"
DriverStateCompleted DriverState = "COMPLETED"
DriverStateFailed DriverState = "FAILED"
DriverStateUnknown DriverState = "UNKNOWN"
)
// ExecutorState tells the current state of an executor.
type ExecutorState string
// Different states an executor may have.
const (
ExecutorStatePending ExecutorState = "PENDING"
ExecutorStateRunning ExecutorState = "RUNNING"
ExecutorStateCompleted ExecutorState = "COMPLETED"
ExecutorStateFailed ExecutorState = "FAILED"
ExecutorStateUnknown ExecutorState = "UNKNOWN"
)
// DynamicAllocation contains configuration options for dynamic allocation.
type DynamicAllocation struct {
// Enabled controls whether dynamic allocation is enabled or not.
// +optional
Enabled bool `json:"enabled,omitempty"`
// InitialExecutors is the initial number of executors to request. If .spec.executor.instances
// is also set, the initial number of executors is set to the bigger of that and this option.
// +optional
InitialExecutors *int32 `json:"initialExecutors,omitempty"`
// MinExecutors is the lower bound for the number of executors if dynamic allocation is enabled.
// +optional
MinExecutors *int32 `json:"minExecutors,omitempty"`
// MaxExecutors is the upper bound for the number of executors if dynamic allocation is enabled.
// +optional
MaxExecutors *int32 `json:"maxExecutors,omitempty"`
// ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
// the need for an external shuffle service. This option will try to keep alive executors that are storing
// shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
// ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
// +optional
ShuffleTrackingEnabled *bool `json:"shuffleTrackingEnabled,omitempty"`
// ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
// shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).
// +optional
ShuffleTrackingTimeout *int64 `json:"shuffleTrackingTimeout,omitempty"`
}

View File

@ -1,36 +0,0 @@
/*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Package v1alpha1 contains API Schema definitions for the v1alpha1 API group
// +kubebuilder:object:generate=true
// +groupName=sparkoperator.k8s.io
package v1alpha1
import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)
var (
// GroupVersion is group version used to register these objects.
GroupVersion = schema.GroupVersion{Group: "sparkoperator.k8s.io", Version: "v1alpha1"}
// SchemeBuilder is used to add go types to the GroupVersionKind scheme.
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}
// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)

View File

@ -1,185 +0,0 @@
/*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1alpha1
import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func init() {
SchemeBuilder.Register(&SparkConnect{}, &SparkConnectList{})
}
// +kubebuilder:object:root=true
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=https://github.com/kubeflow/spark-operator/pull/1298"
// +kubebuilder:resource:scope=Namespaced,shortName=sparkconn,singular=sparkconnect
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:JSONPath=.metadata.creationTimestamp,name=Age,type=date
// SparkConnect is the Schema for the sparkconnections API.
type SparkConnect struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata"`
Spec SparkConnectSpec `json:"spec"`
Status SparkConnectStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// SparkConnectList contains a list of SparkConnect.
type SparkConnectList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []SparkConnect `json:"items"`
}
// SparkConnectSpec defines the desired state of SparkConnect.
type SparkConnectSpec struct {
// SparkVersion is the version of Spark the spark connect use.
SparkVersion string `json:"sparkVersion"`
// Image is the container image for the driver, executor, and init-container. Any custom container images for the
// driver, executor, or init-container takes precedence over this.
// +optional
Image *string `json:"image,omitempty"`
// HadoopConf carries user-specified Hadoop configuration properties as they would use the "--conf" option
// in spark-submit. The SparkApplication controller automatically adds prefix "spark.hadoop." to Hadoop
// configuration properties.
// +optional
HadoopConf map[string]string `json:"hadoopConf,omitempty"`
// SparkConf carries user-specified Spark configuration properties as they would use the "--conf" option in
// spark-submit.
// +optional
SparkConf map[string]string `json:"sparkConf,omitempty"`
// Server is the Spark connect server specification.
Server ServerSpec `json:"server"`
// Executor is the Spark executor specification.
Executor ExecutorSpec `json:"executor"`
// DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
// scheduler backend since Spark 3.0.
// +optional
DynamicAllocation *DynamicAllocation `json:"dynamicAllocation,omitempty"`
}
// ServerSpec is specification of the Spark connect server.
type ServerSpec struct {
SparkPodSpec `json:",inline"`
}
// ExecutorSpec is specification of the executor.
type ExecutorSpec struct {
SparkPodSpec `json:",inline"`
// Instances is the number of executor instances.
// +optional
// +kubebuilder:validation:Minimum=0
Instances *int32 `json:"instances,omitempty"`
}
// SparkPodSpec defines common things that can be customized for a Spark driver or executor pod.
type SparkPodSpec struct {
// Cores maps to `spark.driver.cores` or `spark.executor.cores` for the driver and executors, respectively.
// +optional
// +kubebuilder:validation:Minimum=1
Cores *int32 `json:"cores,omitempty"`
// Memory is the amount of memory to request for the pod.
// +optional
Memory *string `json:"memory,omitempty"`
// Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
// Spark version >= 3.0.0 is required.
// Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
// +optional
// +kubebuilder:validation:Schemaless
// +kubebuilder:validation:Type:=object
// +kubebuilder:pruning:PreserveUnknownFields
Template *corev1.PodTemplateSpec `json:"template,omitempty"`
}
// SparkConnectStatus defines the observed state of SparkConnect.
type SparkConnectStatus struct {
// Represents the latest available observations of a SparkConnect's current state.
// +patchMergeKey=type
// +patchStrategy=merge
// +listType=map
// +listMapKey=type
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty" patchMergeKey:"type" patchStrategy:"merge"`
// State represents the current state of the SparkConnect.
State SparkConnectState `json:"state,omitempty"`
// Server represents the current state of the SparkConnect server.
Server SparkConnectServerStatus `json:"server,omitempty"`
// Executors represents the current state of the SparkConnect executors.
Executors map[string]int `json:"executors,omitempty"`
// StartTime is the time at which the SparkConnect controller started processing the SparkConnect.
StartTime metav1.Time `json:"startTime,omitempty"`
// LastUpdateTime is the time at which the SparkConnect controller last updated the SparkConnect.
LastUpdateTime metav1.Time `json:"lastUpdateTime,omitempty"`
}
// SparkConnectConditionType represents the condition types of the SparkConnect.
type SparkConnectConditionType string
// All possible condition types of the SparkConnect.
const (
SparkConnectConditionServerPodReady SparkConnectConditionType = "ServerPodReady"
)
// SparkConnectConditionReason represents the reason of SparkConnect conditions.
type SparkConnectConditionReason string
// All possible reasons of SparkConnect conditions.
const (
SparkConnectConditionReasonServerPodReady SparkConnectConditionReason = "ServerPodReady"
SparkConnectConditionReasonServerPodNotReady SparkConnectConditionReason = "ServerPodNotReady"
)
// SparkConnectState represents the current state of the SparkConnect.
type SparkConnectState string
// All possible states of the SparkConnect.
const (
SparkConnectStateNew SparkConnectState = ""
SparkConnectStateProvisioning SparkConnectState = "Provisioning"
SparkConnectStateReady SparkConnectState = "Ready"
SparkConnectStateNotReady SparkConnectState = "NotReady"
SparkConnectStateFailed SparkConnectState = "Failed"
)
type SparkConnectServerStatus struct {
// PodName is the name of the pod that is running the Spark Connect server.
PodName string `json:"podName,omitempty"`
// PodIP is the IP address of the pod that is running the Spark Connect server.
PodIP string `json:"podIp,omitempty"`
// ServiceName is the name of the service that is exposing the Spark Connect server.
ServiceName string `json:"serviceName,omitempty"`
}

View File

@ -1,281 +0,0 @@
//go:build !ignore_autogenerated
/*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Code generated by controller-gen. DO NOT EDIT.
package v1alpha1
import (
"k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
)
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *DynamicAllocation) DeepCopyInto(out *DynamicAllocation) {
*out = *in
if in.InitialExecutors != nil {
in, out := &in.InitialExecutors, &out.InitialExecutors
*out = new(int32)
**out = **in
}
if in.MinExecutors != nil {
in, out := &in.MinExecutors, &out.MinExecutors
*out = new(int32)
**out = **in
}
if in.MaxExecutors != nil {
in, out := &in.MaxExecutors, &out.MaxExecutors
*out = new(int32)
**out = **in
}
if in.ShuffleTrackingEnabled != nil {
in, out := &in.ShuffleTrackingEnabled, &out.ShuffleTrackingEnabled
*out = new(bool)
**out = **in
}
if in.ShuffleTrackingTimeout != nil {
in, out := &in.ShuffleTrackingTimeout, &out.ShuffleTrackingTimeout
*out = new(int64)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DynamicAllocation.
func (in *DynamicAllocation) DeepCopy() *DynamicAllocation {
if in == nil {
return nil
}
out := new(DynamicAllocation)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ExecutorSpec) DeepCopyInto(out *ExecutorSpec) {
*out = *in
in.SparkPodSpec.DeepCopyInto(&out.SparkPodSpec)
if in.Instances != nil {
in, out := &in.Instances, &out.Instances
*out = new(int32)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ExecutorSpec.
func (in *ExecutorSpec) DeepCopy() *ExecutorSpec {
if in == nil {
return nil
}
out := new(ExecutorSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ServerSpec) DeepCopyInto(out *ServerSpec) {
*out = *in
in.SparkPodSpec.DeepCopyInto(&out.SparkPodSpec)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ServerSpec.
func (in *ServerSpec) DeepCopy() *ServerSpec {
if in == nil {
return nil
}
out := new(ServerSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnect) DeepCopyInto(out *SparkConnect) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ObjectMeta.DeepCopyInto(&out.ObjectMeta)
in.Spec.DeepCopyInto(&out.Spec)
in.Status.DeepCopyInto(&out.Status)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnect.
func (in *SparkConnect) DeepCopy() *SparkConnect {
if in == nil {
return nil
}
out := new(SparkConnect)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *SparkConnect) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectList) DeepCopyInto(out *SparkConnectList) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ListMeta.DeepCopyInto(&out.ListMeta)
if in.Items != nil {
in, out := &in.Items, &out.Items
*out = make([]SparkConnect, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectList.
func (in *SparkConnectList) DeepCopy() *SparkConnectList {
if in == nil {
return nil
}
out := new(SparkConnectList)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *SparkConnectList) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectServerStatus) DeepCopyInto(out *SparkConnectServerStatus) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectServerStatus.
func (in *SparkConnectServerStatus) DeepCopy() *SparkConnectServerStatus {
if in == nil {
return nil
}
out := new(SparkConnectServerStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectSpec) DeepCopyInto(out *SparkConnectSpec) {
*out = *in
if in.Image != nil {
in, out := &in.Image, &out.Image
*out = new(string)
**out = **in
}
if in.HadoopConf != nil {
in, out := &in.HadoopConf, &out.HadoopConf
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
if in.SparkConf != nil {
in, out := &in.SparkConf, &out.SparkConf
*out = make(map[string]string, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
in.Server.DeepCopyInto(&out.Server)
in.Executor.DeepCopyInto(&out.Executor)
if in.DynamicAllocation != nil {
in, out := &in.DynamicAllocation, &out.DynamicAllocation
*out = new(DynamicAllocation)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectSpec.
func (in *SparkConnectSpec) DeepCopy() *SparkConnectSpec {
if in == nil {
return nil
}
out := new(SparkConnectSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkConnectStatus) DeepCopyInto(out *SparkConnectStatus) {
*out = *in
if in.Conditions != nil {
in, out := &in.Conditions, &out.Conditions
*out = make([]metav1.Condition, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
out.Server = in.Server
if in.Executors != nil {
in, out := &in.Executors, &out.Executors
*out = make(map[string]int, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
in.StartTime.DeepCopyInto(&out.StartTime)
in.LastUpdateTime.DeepCopyInto(&out.LastUpdateTime)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkConnectStatus.
func (in *SparkConnectStatus) DeepCopy() *SparkConnectStatus {
if in == nil {
return nil
}
out := new(SparkConnectStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SparkPodSpec) DeepCopyInto(out *SparkPodSpec) {
*out = *in
if in.Cores != nil {
in, out := &in.Cores, &out.Cores
*out = new(int32)
**out = **in
}
if in.Memory != nil {
in, out := &in.Memory, &out.Memory
*out = new(string)
**out = **in
}
if in.Template != nil {
in, out := &in.Template, &out.Template
*out = new(v1.PodTemplateSpec)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SparkPodSpec.
func (in *SparkPodSpec) DeepCopy() *SparkPodSpec {
if in == nil {
return nil
}
out := new(SparkPodSpec)
in.DeepCopyInto(out)
return out
}

View File

@ -1,36 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Package v1beta2 contains API Schema definitions for the v1beta2 API group
// +kubebuilder:object:generate=true
// +groupName=sparkoperator.k8s.io
package v1beta2
import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)
var (
// GroupVersion is group version used to register these objects.
GroupVersion = schema.GroupVersion{Group: "sparkoperator.k8s.io", Version: "v1beta2"}
// SchemeBuilder is used to add go types to the GroupVersionKind scheme.
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}
// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)

View File

@ -1,135 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1beta2
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
func init() {
SchemeBuilder.Register(&ScheduledSparkApplication{}, &ScheduledSparkApplicationList{})
}
// ScheduledSparkApplicationSpec defines the desired state of ScheduledSparkApplication.
type ScheduledSparkApplicationSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make generate" to regenerate code after modifying this file
// Schedule is a cron schedule on which the application should run.
Schedule string `json:"schedule"`
// TimeZone is the time zone in which the cron schedule will be interpreted in.
// This value is passed to time.LoadLocation, so it must be either "Local", "UTC",
// or a valid IANA location name e.g. "America/New_York".
// +optional
// Defaults to "Local".
TimeZone string `json:"timeZone,omitempty"`
// Template is a template from which SparkApplication instances can be created.
Template SparkApplicationSpec `json:"template"`
// Suspend is a flag telling the controller to suspend subsequent runs of the application if set to true.
// +optional
// Defaults to false.
Suspend *bool `json:"suspend,omitempty"`
// ConcurrencyPolicy is the policy governing concurrent SparkApplication runs.
ConcurrencyPolicy ConcurrencyPolicy `json:"concurrencyPolicy,omitempty"`
// SuccessfulRunHistoryLimit is the number of past successful runs of the application to keep.
// +optional
// Defaults to 1.
SuccessfulRunHistoryLimit *int32 `json:"successfulRunHistoryLimit,omitempty"`
// FailedRunHistoryLimit is the number of past failed runs of the application to keep.
// +optional
// Defaults to 1.
FailedRunHistoryLimit *int32 `json:"failedRunHistoryLimit,omitempty"`
}
// ScheduledSparkApplicationStatus defines the observed state of ScheduledSparkApplication.
type ScheduledSparkApplicationStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make generate" to regenerate code after modifying this file
// LastRun is the time when the last run of the application started.
// +nullable
LastRun metav1.Time `json:"lastRun,omitempty"`
// NextRun is the time when the next run of the application will start.
// +nullable
NextRun metav1.Time `json:"nextRun,omitempty"`
// LastRunName is the name of the SparkApplication for the most recent run of the application.
LastRunName string `json:"lastRunName,omitempty"`
// PastSuccessfulRunNames keeps the names of SparkApplications for past successful runs.
PastSuccessfulRunNames []string `json:"pastSuccessfulRunNames,omitempty"`
// PastFailedRunNames keeps the names of SparkApplications for past failed runs.
PastFailedRunNames []string `json:"pastFailedRunNames,omitempty"`
// ScheduleState is the current scheduling state of the application.
ScheduleState ScheduleState `json:"scheduleState,omitempty"`
// Reason tells why the ScheduledSparkApplication is in the particular ScheduleState.
Reason string `json:"reason,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=https://github.com/kubeflow/spark-operator/pull/1298"
// +kubebuilder:resource:scope=Namespaced,shortName=scheduledsparkapp,singular=scheduledsparkapplication
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:JSONPath=.spec.schedule,name=Schedule,type=string
// +kubebuilder:printcolumn:JSONPath=.spec.timeZone,name=TimeZone,type=string
// +kubebuilder:printcolumn:JSONPath=.spec.suspend,name=Suspend,type=string
// +kubebuilder:printcolumn:JSONPath=.status.lastRun,name=Last Run,type=date
// +kubebuilder:printcolumn:JSONPath=.status.lastRunName,name=Last Run Name,type=string
// +kubebuilder:printcolumn:JSONPath=.metadata.creationTimestamp,name=Age,type=date
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +genclient
// ScheduledSparkApplication is the Schema for the scheduledsparkapplications API.
type ScheduledSparkApplication struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata"`
Spec ScheduledSparkApplicationSpec `json:"spec"`
Status ScheduledSparkApplicationStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// ScheduledSparkApplicationList contains a list of ScheduledSparkApplication.
type ScheduledSparkApplicationList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []ScheduledSparkApplication `json:"items"`
}
type ConcurrencyPolicy string
const (
// ConcurrencyAllow allows SparkApplications to run concurrently.
ConcurrencyAllow ConcurrencyPolicy = "Allow"
// ConcurrencyForbid forbids concurrent runs of SparkApplications, skipping the next run if the previous
// one hasn't finished yet.
ConcurrencyForbid ConcurrencyPolicy = "Forbid"
// ConcurrencyReplace kills the currently running SparkApplication instance and replaces it with a new one.
ConcurrencyReplace ConcurrencyPolicy = "Replace"
)
type ScheduleState string
const (
ScheduleStateNew ScheduleState = ""
ScheduleStateValidating ScheduleState = "Validating"
ScheduleStateScheduled ScheduleState = "Scheduled"
ScheduleStateFailedValidation ScheduleState = "FailedValidation"
)

View File

@ -1,39 +0,0 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
ci/
.helmignore
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
*.tmproj
.project
.idea/
.vscode/
# MacOS
.DS_Store
# helm-unittest
tests
.debug
__snapshot__
# helm-docs
README.md.gotmpl

View File

@ -1,39 +1,11 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
apiVersion: v2
name: spark-operator
description: A Helm chart for Spark on Kubernetes operator.
version: 2.2.1
appVersion: 2.2.1
description: A Helm chart for Spark on Kubernetes operator
version: 1.1.10
appVersion: v1beta2-1.2.3-3.1.1
keywords:
- apache spark
- big data
home: https://github.com/kubeflow/spark-operator
- spark
home: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
maintainers:
- name: yuchaoran2011
email: yuchaoran2011@gmail.com
url: https://github.com/yuchaoran2011
- name: ChenYi015
email: github@chenyicn.net
url: https://github.com/ChenYi015
- name: yuchaoran2011
email: yuchaoran2011@gmail.com

View File

@ -1,19 +1,15 @@
# spark-operator
![Version: 2.2.1](https://img.shields.io/badge/Version-2.2.1-informational?style=flat-square) ![AppVersion: 2.2.1](https://img.shields.io/badge/AppVersion-2.2.1-informational?style=flat-square)
A Helm chart for Spark on Kubernetes operator.
**Homepage:** <https://github.com/kubeflow/spark-operator>
A Helm chart for Spark on Kubernetes operator
## Introduction
This chart bootstraps a [Kubernetes Operator for Apache Spark](https://github.com/kubeflow/spark-operator) deployment using the [Helm](https://helm.sh) package manager.
This chart bootstraps a [Kubernetes Operator for Apache Spark](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) deployment using the [Helm](https://helm.sh) package manager.
## Prerequisites
- Helm >= 3
- Kubernetes >= 1.16
- Kubernetes >= 1.13
## Previous Helm Chart
@ -23,170 +19,120 @@ The previous `spark-operator` Helm chart hosted at [helm/charts](https://github.
- Previous versions of the Helm chart have not been migrated, and the version has been set to `1.0.0` at the onset. If you are looking for old versions of the chart, it's best to run `helm pull incubator/sparkoperator --version <your-version>` until you are ready to move to this repository's version.
- Several configuration properties have been changed, carefully review the [values](#values) section below to make sure you're aligned with the new values.
## Usage
### Add Helm Repo
## Installing the chart
```shell
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
$ helm install my-release spark-operator/spark-operator
```
See [helm repo](https://helm.sh/docs/helm/helm_repo) for command documentation.
### Install the chart
This will create a release of `spark-operator` in the default namespace. To install in a different one:
```shell
helm install [RELEASE_NAME] spark-operator/spark-operator
$ helm install -n spark my-release spark-operator/spark-operator
```
For example, if you want to create a release with name `spark-operator` in the `spark-operator` namespace:
Note that `helm` will fail to install if the namespace doesn't exist. Either create the namespace beforehand or pass the `--create-namespace` flag to the `helm install` command.
## Uninstalling the chart
To uninstall `my-release`:
```shell
helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace
$ helm uninstall my-release
```
Note that by passing the `--create-namespace` flag to the `helm install` command, `helm` will create the release namespace if it does not exist.
The command removes all the Kubernetes components associated with the chart and deletes the release, except for the `crds`, those will have to be removed manually.
See [helm install](https://helm.sh/docs/helm/helm_install) for command documentation.
## Test the chart
### Upgrade the chart
Install [chart-testing cli](https://github.com/helm/chart-testing#installation)
```shell
helm upgrade [RELEASE_NAME] spark-operator/spark-operator [flags]
In Mac OS, you can just:
```bash
pip install yamale
pip install yamllint
brew install chart-testing
```
See [helm upgrade](https://helm.sh/docs/helm/helm_upgrade) for command documentation.
Run ct lint and Verify `All charts linted successfully`
### Uninstall the chart
```bash
Chart version ok.
Validating /Users/chethanuk/Work/Github/Personal/spark-on-k8s-operator-1/charts/spark-operator-chart/Chart.yaml...
Validation success! 👍
Validating maintainers...
==> Linting charts/spark-operator-chart
[INFO] Chart.yaml: icon is recommended
```shell
helm uninstall [RELEASE_NAME]
1 chart(s) linted, 0 chart(s) failed
------------------------------------------------------------------------------------------------------------------------
✔︎ spark-operator => (version: "1.1.0", path: "charts/spark-operator-chart")
------------------------------------------------------------------------------------------------------------------------
All charts linted successfully
```
This removes all the Kubernetes resources associated with the chart and deletes the release, except for the `crds`, those will have to be removed manually.
See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command documentation.
## Values
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| nameOverride | string | `""` | String to partially override release name. |
| fullnameOverride | string | `""` | String to fully override release name. |
| commonLabels | object | `{}` | Common labels to add to the resources. |
| image.registry | string | `"ghcr.io"` | Image registry. |
| image.repository | string | `"kubeflow/spark-operator/controller"` | Image repository. |
| image.tag | string | If not set, the chart appVersion will be used. | Image tag. |
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy. |
| image.pullSecrets | list | `[]` | Image pull secrets for private image registry. |
| controller.replicas | int | `1` | Number of replicas of controller. |
| controller.leaderElection.enable | bool | `true` | Specifies whether to enable leader election for controller. |
| controller.workers | int | `10` | Reconcile concurrency, higher values might increase memory usage. |
| controller.logLevel | string | `"info"` | Configure the verbosity of logging, can be one of `debug`, `info`, `error`. |
| controller.logEncoder | string | `"console"` | Configure the encoder of logging, can be one of `console` or `json`. |
| controller.driverPodCreationGracePeriod | string | `"10s"` | Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created. |
| controller.maxTrackedExecutorPerApp | int | `1000` | Specifies the maximum number of Executor pods that can be tracked by the controller per SparkApplication. |
| controller.uiService.enable | bool | `true` | Specifies whether to create service for Spark web UI. |
| controller.uiIngress.enable | bool | `false` | Specifies whether to create ingress for Spark web UI. `controller.uiService.enable` must be `true` to enable ingress. |
| controller.uiIngress.urlFormat | string | `""` | Ingress URL format. Required if `controller.uiIngress.enable` is true. |
| controller.uiIngress.ingressClassName | string | `""` | Optionally set the ingressClassName. |
| controller.uiIngress.tls | list | `[]` | Optionally set default TLS configuration for the Spark UI's ingress. `ingressTLS` in the SparkApplication spec overrides this. |
| controller.uiIngress.annotations | object | `{}` | Optionally set default ingress annotations for the Spark UI's ingress. `ingressAnnotations` in the SparkApplication spec overrides this. |
| controller.batchScheduler.enable | bool | `false` | Specifies whether to enable batch scheduler for spark jobs scheduling. If enabled, users can specify batch scheduler name in spark application. |
| controller.batchScheduler.kubeSchedulerNames | list | `[]` | Specifies a list of kube-scheduler names for scheduling Spark pods. |
| controller.batchScheduler.default | string | `""` | Default batch scheduler to be used if not specified by the user. If specified, this value must be either "volcano" or "yunikorn". Specifying any other value will cause the controller to error on startup. |
| controller.serviceAccount.create | bool | `true` | Specifies whether to create a service account for the controller. |
| controller.serviceAccount.name | string | `""` | Optional name for the controller service account. |
| controller.serviceAccount.annotations | object | `{}` | Extra annotations for the controller service account. |
| controller.serviceAccount.automountServiceAccountToken | bool | `true` | Auto-mount service account token to the controller pods. |
| controller.rbac.create | bool | `true` | Specifies whether to create RBAC resources for the controller. |
| controller.rbac.annotations | object | `{}` | Extra annotations for the controller RBAC resources. |
| controller.labels | object | `{}` | Extra labels for controller pods. |
| controller.annotations | object | `{}` | Extra annotations for controller pods. |
| controller.volumes | list | `[{"emptyDir":{"sizeLimit":"1Gi"},"name":"tmp"}]` | Volumes for controller pods. |
| controller.nodeSelector | object | `{}` | Node selector for controller pods. |
| controller.affinity | object | `{}` | Affinity for controller pods. |
| controller.tolerations | list | `[]` | List of node taints to tolerate for controller pods. |
| controller.priorityClassName | string | `""` | Priority class for controller pods. |
| controller.podSecurityContext | object | `{"fsGroup":185}` | Security context for controller pods. |
| controller.topologySpreadConstraints | list | `[]` | Topology spread constraints rely on node labels to identify the topology domain(s) that each Node is in. Ref: [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/). The labelSelector field in topology spread constraint will be set to the selector labels for controller pods if not specified. |
| controller.env | list | `[]` | Environment variables for controller containers. |
| controller.envFrom | list | `[]` | Environment variable sources for controller containers. |
| controller.volumeMounts | list | `[{"mountPath":"/tmp","name":"tmp","readOnly":false}]` | Volume mounts for controller containers. |
| controller.resources | object | `{}` | Pod resource requests and limits for controller containers. Note, that each job submission will spawn a JVM within the controller pods using "/usr/local/openjdk-11/bin/java -Xmx128m". Kubernetes may kill these Java processes at will to enforce resource limits. When that happens, you will see the following error: 'failed to run spark-submit for SparkApplication [...]: signal: killed' - when this happens, you may want to increase memory limits. |
| controller.securityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"readOnlyRootFilesystem":true,"runAsNonRoot":true,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for controller containers. |
| controller.sidecars | list | `[]` | Sidecar containers for controller pods. |
| controller.podDisruptionBudget.enable | bool | `false` | Specifies whether to create pod disruption budget for controller. Ref: [Specifying a Disruption Budget for your Application](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) |
| controller.podDisruptionBudget.minAvailable | int | `1` | The number of pods that must be available. Require `controller.replicas` to be greater than 1 |
| controller.pprof.enable | bool | `false` | Specifies whether to enable pprof. |
| controller.pprof.port | int | `6060` | Specifies pprof port. |
| controller.pprof.portName | string | `"pprof"` | Specifies pprof service port name. |
| controller.workqueueRateLimiter.bucketQPS | int | `50` | Specifies the average rate of items process by the workqueue rate limiter. |
| controller.workqueueRateLimiter.bucketSize | int | `500` | Specifies the maximum number of items that can be in the workqueue at any given time. |
| controller.workqueueRateLimiter.maxDelay.enable | bool | `true` | Specifies whether to enable max delay for the workqueue rate limiter. This is useful to avoid losing events when the workqueue is full. |
| controller.workqueueRateLimiter.maxDelay.duration | string | `"6h"` | Specifies the maximum delay duration for the workqueue rate limiter. |
| webhook.enable | bool | `true` | Specifies whether to enable webhook. |
| webhook.replicas | int | `1` | Number of replicas of webhook server. |
| webhook.leaderElection.enable | bool | `true` | Specifies whether to enable leader election for webhook. |
| webhook.logLevel | string | `"info"` | Configure the verbosity of logging, can be one of `debug`, `info`, `error`. |
| webhook.logEncoder | string | `"console"` | Configure the encoder of logging, can be one of `console` or `json`. |
| webhook.port | int | `9443` | Specifies webhook port. |
| webhook.portName | string | `"webhook"` | Specifies webhook service port name. |
| webhook.failurePolicy | string | `"Fail"` | Specifies how unrecognized errors are handled. Available options are `Ignore` or `Fail`. |
| webhook.timeoutSeconds | int | `10` | Specifies the timeout seconds of the webhook, the value must be between 1 and 30. |
| webhook.resourceQuotaEnforcement.enable | bool | `false` | Specifies whether to enable the ResourceQuota enforcement for SparkApplication resources. |
| webhook.serviceAccount.create | bool | `true` | Specifies whether to create a service account for the webhook. |
| webhook.serviceAccount.name | string | `""` | Optional name for the webhook service account. |
| webhook.serviceAccount.annotations | object | `{}` | Extra annotations for the webhook service account. |
| webhook.serviceAccount.automountServiceAccountToken | bool | `true` | Auto-mount service account token to the webhook pods. |
| webhook.rbac.create | bool | `true` | Specifies whether to create RBAC resources for the webhook. |
| webhook.rbac.annotations | object | `{}` | Extra annotations for the webhook RBAC resources. |
| webhook.labels | object | `{}` | Extra labels for webhook pods. |
| webhook.annotations | object | `{}` | Extra annotations for webhook pods. |
| webhook.sidecars | list | `[]` | Sidecar containers for webhook pods. |
| webhook.volumes | list | `[{"emptyDir":{"sizeLimit":"500Mi"},"name":"serving-certs"}]` | Volumes for webhook pods. |
| webhook.nodeSelector | object | `{}` | Node selector for webhook pods. |
| webhook.affinity | object | `{}` | Affinity for webhook pods. |
| webhook.tolerations | list | `[]` | List of node taints to tolerate for webhook pods. |
| webhook.priorityClassName | string | `""` | Priority class for webhook pods. |
| webhook.podSecurityContext | object | `{"fsGroup":185}` | Security context for webhook pods. |
| webhook.topologySpreadConstraints | list | `[]` | Topology spread constraints rely on node labels to identify the topology domain(s) that each Node is in. Ref: [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/). The labelSelector field in topology spread constraint will be set to the selector labels for webhook pods if not specified. |
| webhook.env | list | `[]` | Environment variables for webhook containers. |
| webhook.envFrom | list | `[]` | Environment variable sources for webhook containers. |
| webhook.volumeMounts | list | `[{"mountPath":"/etc/k8s-webhook-server/serving-certs","name":"serving-certs","readOnly":false,"subPath":"serving-certs"}]` | Volume mounts for webhook containers. |
| webhook.resources | object | `{}` | Pod resource requests and limits for webhook pods. |
| webhook.securityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"readOnlyRootFilesystem":true,"runAsNonRoot":true,"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for webhook containers. |
| webhook.podDisruptionBudget.enable | bool | `false` | Specifies whether to create pod disruption budget for webhook. Ref: [Specifying a Disruption Budget for your Application](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) |
| webhook.podDisruptionBudget.minAvailable | int | `1` | The number of pods that must be available. Require `webhook.replicas` to be greater than 1 |
| spark.jobNamespaces | list | `["default"]` | List of namespaces where to run spark jobs. If empty string is included, all namespaces will be allowed. Make sure the namespaces have already existed. |
| spark.serviceAccount.create | bool | `true` | Specifies whether to create a service account for spark applications. |
| spark.serviceAccount.name | string | `""` | Optional name for the spark service account. |
| spark.serviceAccount.annotations | object | `{}` | Optional annotations for the spark service account. |
| spark.serviceAccount.automountServiceAccountToken | bool | `true` | Auto-mount service account token to the spark applications pods. |
| spark.rbac.create | bool | `true` | Specifies whether to create RBAC resources for spark applications. |
| spark.rbac.annotations | object | `{}` | Optional annotations for the spark application RBAC resources. |
| prometheus.metrics.enable | bool | `true` | Specifies whether to enable prometheus metrics scraping. |
| prometheus.metrics.port | int | `8080` | Metrics port. |
| prometheus.metrics.portName | string | `"metrics"` | Metrics port name. |
| prometheus.metrics.endpoint | string | `"/metrics"` | Metrics serving endpoint. |
| prometheus.metrics.prefix | string | `""` | Metrics prefix, will be added to all exported metrics. |
| prometheus.metrics.jobStartLatencyBuckets | string | `"30,60,90,120,150,180,210,240,270,300"` | Job Start Latency histogram buckets. Specified in seconds. |
| prometheus.podMonitor.create | bool | `false` | Specifies whether to create pod monitor. Note that prometheus metrics should be enabled as well. |
| prometheus.podMonitor.labels | object | `{}` | Pod monitor labels |
| prometheus.podMonitor.jobLabel | string | `"spark-operator-podmonitor"` | The label to use to retrieve the job name from |
| prometheus.podMonitor.podMetricsEndpoint | object | `{"interval":"5s","scheme":"http"}` | Prometheus metrics endpoint properties. `metrics.portName` will be used as a port |
| certManager.enable | bool | `false` | Specifies whether to use [cert-manager](https://cert-manager.io) to generate certificate for webhook. `webhook.enable` must be set to `true` to enable cert-manager. |
| certManager.issuerRef | object | A self-signed issuer will be created and used if not specified. | The reference to the issuer. |
| certManager.duration | string | `2160h` (90 days) will be used if not specified. | The duration of the certificate validity (e.g. `2160h`). See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate). |
| certManager.renewBefore | string | 1/3 of issued certificates lifetime. | The duration before the certificate expiration to renew the certificate (e.g. `720h`). See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate). |
| affinity | object | `{}` | Affinity for pod assignment |
| batchScheduler.enable | bool | `false` | Enable batch scheduler for spark jobs scheduling. If enabled, users can specify batch scheduler name in spark application |
| controllerThreads | int | `10` | Operator concurrency, higher values might increase memory usage |
| fullnameOverride | string | `""` | String to override release name |
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy |
| image.repository | string | `"gcr.io/spark-operator/spark-operator"` | Image repository |
| image.tag | string | `""` | Overrides the image tag whose default is the chart appVersion. |
| imagePullSecrets | list | `[]` | Image pull secrets |
| uiService.enable | bool | `""` | Enable UI service creation for Spark application |
| ingressUrlFormat | string | `""` | Ingress URL format. Requires the UI service to be enabled by setting `uiService.enable` to true. |
| istio.enabled | bool | `false` | When using `istio`, spark jobs need to run without a sidecar to properly terminate |
| labelSelectorFilter | string | `""` | A comma-separated list of key=value, or key labels to filter resources during watch and list based on the specified labels. |
| leaderElection.lockName | string | `"spark-operator-lock"` | Leader election lock name. Ref: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#enabling-leader-election-for-high-availability. |
| leaderElection.lockNamespace | string | `""` | Optionally store the lock in another namespace. Defaults to operator's namespace |
| logLevel | int | `2` | Set higher levels for more verbose logging |
| metrics.enable | bool | `true` | Enable prometheus metric scraping |
| metrics.endpoint | string | `"/metrics"` | Metrics serving endpoint |
| metrics.port | int | `10254` | Metrics port |
| metrics.portName | string | `"metrics"` | Metrics port name |
| metrics.prefix | string | `""` | Metric prefix, will be added to all exported metrics |
| nameOverride | string | `""` | String to partially override `spark-operator.fullname` template (will maintain the release name) |
| nodeSelector | object | `{}` | Node labels for pod assignment |
| podAnnotations | object | `{}` | Additional annotations to add to the pod |
| podLabels | object | `{}` | Additional labels to add to the pod |
| podMonitor | object | `{"enable":false,"jobLabel":"spark-operator-podmonitor","labels":{},"podMetricsEndpoint":{"interval":"5s","scheme":"http"}}` | Prometheus pod monitor for operator's pod. |
| podMonitor.enable | bool | `false` | If enabled, a pod monitor for operator's pod will be submitted. Note that prometheus metrics should be enabled as well. |
| podMonitor.jobLabel | string | `"spark-operator-podmonitor"` | The label to use to retrieve the job name from |
| podMonitor.labels | object | `{}` | Pod monitor labels |
| podMonitor.podMetricsEndpoint | object | `{"interval":"5s","scheme":"http"}` | Prometheus metrics endpoint properties. `metrics.portName` will be used as a port |
| podSecurityContext | object | `{}` | Pod security context |
| rbac.create | bool | `false` | **DEPRECATED** use `createRole` and `createClusterRole` |
| rbac.createClusterRole | bool | `true` | Create and use RBAC `ClusterRole` resources |
| rbac.createRole | bool | `true` | Create and use RBAC `Role` resources |
| replicaCount | int | `1` | Desired number of pods, leaderElection will be enabled if this is greater than 1 |
| resourceQuotaEnforcement.enable | bool | `false` | Whether to enable the ResourceQuota enforcement for SparkApplication resources. Requires the webhook to be enabled by setting `webhook.enable` to true. Ref: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#enabling-resource-quota-enforcement. |
| resources | object | `{}` | Pod resource requests and limits |
| resyncInterval | int | `30` | Operator resync interval. Note that the operator will respond to events (e.g. create, update) unrelated to this setting |
| securityContext | object | `{}` | Operator container security context |
| serviceAccounts.spark.annotations | object | `{}` | Optional annotations for the spark service account |
| serviceAccounts.spark.create | bool | `true` | Create a service account for spark apps |
| serviceAccounts.spark.name | string | `""` | Optional name for the spark service account |
| serviceAccounts.sparkoperator.annotations | object | `{}` | Optional annotations for the operator service account |
| serviceAccounts.sparkoperator.create | bool | `true` | Create a service account for the operator |
| serviceAccounts.sparkoperator.name | string | `""` | Optional name for the operator service account |
| sparkJobNamespace | string | `""` | Set this if running spark jobs in a different namespace than the operator |
| tolerations | list | `[]` | List of node taints to tolerate |
| webhook.initAnnotations | object | `{"helm.sh/hook":"pre-install, pre-upgrade","helm.sh/hook-weight":"50"}` | The annotations applied to init job, required to restore certs deleted by the cleanup job during upgrade |
| webhook.cleanupAnnotations | object | `{"helm.sh/hook":"pre-delete, pre-upgrade","helm.sh/hook-delete-policy":"hook-succeeded"}` | The annotations applied to the cleanup job, required for helm lifecycle hooks |
| webhook.enable | bool | `false` | Enable webhook server |
| webhook.namespaceSelector | string | `""` | The webhook server will only operate on namespaces with this label, specified in the form key1=value1,key2=value2. Empty string (default) will operate on all namespaces |
| webhook.port | int | `8080` | Webhook service port |
## Maintainers
| Name | Email | Url |
| ---- | ------ | --- |
| yuchaoran2011 | <yuchaoran2011@gmail.com> | <https://github.com/yuchaoran2011> |
| ChenYi015 | <github@chenyicn.net> | <https://github.com/ChenYi015> |
| yuchaoran2011 | yuchaoran2011@gmail.com | |

View File

@ -1,13 +1,7 @@
{{ template "chart.header" . }}
{{ template "chart.deprecationWarning" . }}
{{ template "chart.badgesSection" . }}
{{ template "chart.description" . }}
{{ template "chart.homepageLine" . }}
## Introduction
This chart bootstraps a [Kubernetes Operator for Apache Spark]({{template "chart.homepage" . }}) deployment using the [Helm](https://helm.sh) package manager.
@ -15,7 +9,7 @@ This chart bootstraps a [Kubernetes Operator for Apache Spark]({{template "chart
## Prerequisites
- Helm >= 3
- Kubernetes >= 1.16
- Kubernetes >= 1.13
## Previous Helm Chart
@ -25,53 +19,32 @@ The previous `spark-operator` Helm chart hosted at [helm/charts](https://github.
- Previous versions of the Helm chart have not been migrated, and the version has been set to `1.0.0` at the onset. If you are looking for old versions of the chart, it's best to run `helm pull incubator/sparkoperator --version <your-version>` until you are ready to move to this repository's version.
- Several configuration properties have been changed, carefully review the [values](#values) section below to make sure you're aligned with the new values.
## Usage
### Add Helm Repo
## Installing the chart
```shell
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
$ helm install my-release spark-operator/spark-operator
```
See [helm repo](https://helm.sh/docs/helm/helm_repo) for command documentation.
### Install the chart
This will create a release of `spark-operator` in the default namespace. To install in a different one:
```shell
helm install [RELEASE_NAME] spark-operator/spark-operator
$ helm install -n spark my-release spark-operator/spark-operator
```
For example, if you want to create a release with name `spark-operator` in the `spark-operator` namespace:
Note that `helm` will fail to install if the namespace doesn't exist. Either create the namespace beforehand or pass the `--create-namespace` flag to the `helm install` command.
## Uninstalling the chart
To uninstall `my-release`:
```shell
helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace
$ helm uninstall my-release
```
Note that by passing the `--create-namespace` flag to the `helm install` command, `helm` will create the release namespace if it does not exist.
See [helm install](https://helm.sh/docs/helm/helm_install) for command documentation.
### Upgrade the chart
```shell
helm upgrade [RELEASE_NAME] spark-operator/spark-operator [flags]
```
See [helm upgrade](https://helm.sh/docs/helm/helm_upgrade) for command documentation.
### Uninstall the chart
```shell
helm uninstall [RELEASE_NAME]
```
This removes all the Kubernetes resources associated with the chart and deletes the release, except for the `crds`, those will have to be removed manually.
See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command documentation.
The command removes all the Kubernetes components associated with the chart and deletes the release, except for the `crds`, those will have to be removed manually.
{{ template "chart.valuesSection" . }}

View File

@ -1,2 +0,0 @@
image:
tag: local

View File

@ -1,5 +0,0 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker

View File

@ -1,272 +0,0 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
api-approved.kubernetes.io: https://github.com/kubeflow/spark-operator/pull/1298
controller-gen.kubebuilder.io/version: v0.17.1
name: sparkconnects.sparkoperator.k8s.io
spec:
group: sparkoperator.k8s.io
names:
kind: SparkConnect
listKind: SparkConnectList
plural: sparkconnects
shortNames:
- sparkconn
singular: sparkconnect
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1alpha1
schema:
openAPIV3Schema:
description: SparkConnect is the Schema for the sparkconnections API.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: SparkConnectSpec defines the desired state of SparkConnect.
properties:
dynamicAllocation:
description: |-
DynamicAllocation configures dynamic allocation that becomes available for the Kubernetes
scheduler backend since Spark 3.0.
properties:
enabled:
description: Enabled controls whether dynamic allocation is enabled
or not.
type: boolean
initialExecutors:
description: |-
InitialExecutors is the initial number of executors to request. If .spec.executor.instances
is also set, the initial number of executors is set to the bigger of that and this option.
format: int32
type: integer
maxExecutors:
description: MaxExecutors is the upper bound for the number of
executors if dynamic allocation is enabled.
format: int32
type: integer
minExecutors:
description: MinExecutors is the lower bound for the number of
executors if dynamic allocation is enabled.
format: int32
type: integer
shuffleTrackingEnabled:
description: |-
ShuffleTrackingEnabled enables shuffle file tracking for executors, which allows dynamic allocation without
the need for an external shuffle service. This option will try to keep alive executors that are storing
shuffle data for active jobs. If external shuffle service is enabled, set ShuffleTrackingEnabled to false.
ShuffleTrackingEnabled is true by default if dynamicAllocation.enabled is true.
type: boolean
shuffleTrackingTimeout:
description: |-
ShuffleTrackingTimeout controls the timeout in milliseconds for executors that are holding
shuffle data if shuffle tracking is enabled (true by default if dynamic allocation is enabled).
format: int64
type: integer
type: object
executor:
description: Executor is the Spark executor specification.
properties:
cores:
description: Cores maps to `spark.driver.cores` or `spark.executor.cores`
for the driver and executors, respectively.
format: int32
minimum: 1
type: integer
instances:
description: Instances is the number of executor instances.
format: int32
minimum: 0
type: integer
memory:
description: Memory is the amount of memory to request for the
pod.
type: string
template:
description: |-
Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
Spark version >= 3.0.0 is required.
Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
hadoopConf:
additionalProperties:
type: string
description: |-
HadoopConf carries user-specified Hadoop configuration properties as they would use the "--conf" option
in spark-submit. The SparkApplication controller automatically adds prefix "spark.hadoop." to Hadoop
configuration properties.
type: object
image:
description: |-
Image is the container image for the driver, executor, and init-container. Any custom container images for the
driver, executor, or init-container takes precedence over this.
type: string
server:
description: Server is the Spark connect server specification.
properties:
cores:
description: Cores maps to `spark.driver.cores` or `spark.executor.cores`
for the driver and executors, respectively.
format: int32
minimum: 1
type: integer
memory:
description: Memory is the amount of memory to request for the
pod.
type: string
template:
description: |-
Template is a pod template that can be used to define the driver or executor pod configurations that Spark configurations do not support.
Spark version >= 3.0.0 is required.
Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template.
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
sparkConf:
additionalProperties:
type: string
description: |-
SparkConf carries user-specified Spark configuration properties as they would use the "--conf" option in
spark-submit.
type: object
sparkVersion:
description: SparkVersion is the version of Spark the spark connect
use.
type: string
required:
- executor
- server
- sparkVersion
type: object
status:
description: SparkConnectStatus defines the observed state of SparkConnect.
properties:
conditions:
description: Represents the latest available observations of a SparkConnect's
current state.
items:
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
x-kubernetes-list-map-keys:
- type
x-kubernetes-list-type: map
executors:
additionalProperties:
type: integer
description: Executors represents the current state of the SparkConnect
executors.
type: object
lastUpdateTime:
description: LastUpdateTime is the time at which the SparkConnect
controller last updated the SparkConnect.
format: date-time
type: string
server:
description: Server represents the current state of the SparkConnect
server.
properties:
podIp:
description: PodIP is the IP address of the pod that is running
the Spark Connect server.
type: string
podName:
description: PodName is the name of the pod that is running the
Spark Connect server.
type: string
serviceName:
description: ServiceName is the name of the service that is exposing
the Spark Connect server.
type: string
type: object
startTime:
description: StartTime is the time at which the SparkConnect controller
started processing the SparkConnect.
format: date-time
type: string
state:
description: State represents the current state of the SparkConnect.
type: string
type: object
required:
- metadata
- spec
type: object
served: true
storage: true
subresources:
status: {}

View File

@ -1,19 +1,3 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
@ -57,9 +41,6 @@ helm.sh/chart: {{ include "spark-operator.chart" . }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- with .Values.commonLabels }}
{{ toYaml . }}
{{- end }}
{{- end }}
{{/*
@ -71,8 +52,25 @@ app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Spark Operator image
Create the name of the service account to be used by the operator
*/}}
{{- define "spark-operator.image" -}}
{{ printf "%s/%s:%s" .Values.image.registry .Values.image.repository (.Values.image.tag | default .Chart.AppVersion | toString) }}
{{- define "spark-operator.serviceAccountName" -}}
{{- if .Values.serviceAccounts.sparkoperator.create -}}
{{ default (include "spark-operator.fullname" .) .Values.serviceAccounts.sparkoperator.name }}
{{- else -}}
{{ default "default" .Values.serviceAccounts.sparkoperator.name }}
{{- end -}}
{{- end -}}
{{/*
Create the name of the service account to be used by spark apps
*/}}
{{- define "spark.serviceAccountName" -}}
{{- if .Values.serviceAccounts.spark.create -}}
{{- $sparkServiceaccount := printf "%s-%s" .Release.Name "spark" -}}
{{ default $sparkServiceaccount .Values.serviceAccounts.spark.name }}
{{- else -}}
{{ default "default" .Values.serviceAccounts.spark.name }}
{{- end -}}
{{- end -}}

View File

@ -1,29 +0,0 @@
{{- /*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/ -}}
{{/*
Create the name of the webhook certificate issuer.
*/}}
{{- define "spark-operator.certManager.issuer.name" -}}
{{ include "spark-operator.name" . }}-self-signed-issuer
{{- end -}}
{{/*
Create the name of the certificate to be used by webhook.
*/}}
{{- define "spark-operator.certManager.certificate.name" -}}
{{ include "spark-operator.name" . }}-certificate
{{- end -}}

View File

@ -1,56 +0,0 @@
{{- /*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/ -}}
{{- if .Values.webhook.enable }}
{{- if .Values.certManager.enable }}
{{- if not (.Capabilities.APIVersions.Has "cert-manager.io/v1/Certificate") }}
{{- fail "The cluster does not support the required API version `cert-manager.io/v1` for `Certificate`." }}
{{- end }}
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: {{ include "spark-operator.certManager.certificate.name" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
secretName: {{ include "spark-operator.webhook.secretName" . }}
issuerRef:
{{- if not .Values.certManager.issuerRef }}
group: cert-manager.io
kind: Issuer
name: {{ include "spark-operator.certManager.issuer.name" . }}
{{- else }}
{{- toYaml .Values.certManager.issuerRef | nindent 4 }}
{{- end }}
commonName: {{ include "spark-operator.webhook.serviceName" . }}.{{ .Release.Namespace }}.svc
dnsNames:
- {{ include "spark-operator.webhook.serviceName" . }}.{{ .Release.Namespace }}.svc
- {{ include "spark-operator.webhook.serviceName" . }}.{{ .Release.Namespace }}.svc.cluster.local
subject:
organizationalUnits:
- spark-operator
usages:
- server auth
- client auth
{{- with .Values.certManager.duration }}
duration: {{ . }}
{{- end }}
{{- with .Values.certManager.renewBefore }}
renewBefore: {{ . }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,34 +0,0 @@
{{- /*
Copyright 2025 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/ -}}
{{- if .Values.webhook.enable }}
{{- if .Values.certManager.enable }}
{{- if not .Values.certManager.issuerRef }}
{{- if not (.Capabilities.APIVersions.Has "cert-manager.io/v1/Issuer") }}
{{- fail "The cluster does not support the required API version `cert-manager.io/v1` for `Issuer`." }}
{{- end }}
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: {{ include "spark-operator.certManager.issuer.name" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
selfSigned: {}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,217 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{/*
Create the name of controller component
*/}}
{{- define "spark-operator.controller.name" -}}
{{- include "spark-operator.fullname" . }}-controller
{{- end -}}
{{/*
Common labels for the controller
*/}}
{{- define "spark-operator.controller.labels" -}}
{{ include "spark-operator.labels" . }}
app.kubernetes.io/component: controller
{{- end -}}
{{/*
Selector labels for the controller
*/}}
{{- define "spark-operator.controller.selectorLabels" -}}
{{ include "spark-operator.selectorLabels" . }}
app.kubernetes.io/component: controller
{{- end -}}
{{/*
Create the name of the service account to be used by the controller
*/}}
{{- define "spark-operator.controller.serviceAccountName" -}}
{{- if .Values.controller.serviceAccount.create -}}
{{ .Values.controller.serviceAccount.name | default (include "spark-operator.controller.name" .) }}
{{- else -}}
{{ .Values.controller.serviceAccount.name | default "default" }}
{{- end -}}
{{- end -}}
{{/*
Create the name of the cluster role to be used by the controller
*/}}
{{- define "spark-operator.controller.clusterRoleName" -}}
{{ include "spark-operator.controller.name" . }}
{{- end }}
{{/*
Create the name of the cluster role binding to be used by the controller
*/}}
{{- define "spark-operator.controller.clusterRoleBindingName" -}}
{{ include "spark-operator.controller.clusterRoleName" . }}
{{- end }}
{{/*
Create the name of the role to be used by the controller
*/}}
{{- define "spark-operator.controller.roleName" -}}
{{ include "spark-operator.controller.name" . }}
{{- end }}
{{/*
Create the name of the role binding to be used by the controller
*/}}
{{- define "spark-operator.controller.roleBindingName" -}}
{{ include "spark-operator.controller.roleName" . }}
{{- end }}
{{/*
Create the name of the deployment to be used by controller
*/}}
{{- define "spark-operator.controller.deploymentName" -}}
{{ include "spark-operator.controller.name" . }}
{{- end -}}
{{/*
Create the name of the lease resource to be used by leader election
*/}}
{{- define "spark-operator.controller.leaderElectionName" -}}
{{ include "spark-operator.controller.name" . }}-lock
{{- end -}}
{{/*
Create the name of the pod disruption budget to be used by controller
*/}}
{{- define "spark-operator.controller.podDisruptionBudgetName" -}}
{{ include "spark-operator.controller.name" . }}-pdb
{{- end -}}
{{/*
Create the name of the service used by controller
*/}}
{{- define "spark-operator.controller.serviceName" -}}
{{ include "spark-operator.controller.name" . }}-svc
{{- end -}}
{{/*
Create the role policy rules for the controller in every Spark job namespace
*/}}
{{- define "spark-operator.controller.policyRules" -}}
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- deletecollection
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- ""
resources:
- services
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- ""
resources:
- events
verbs:
- create
- update
- patch
- apiGroups:
- extensions
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- create
- update
- delete
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- scheduledsparkapplications
- sparkconnects
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications/status
- sparkapplications/finalizers
- scheduledsparkapplications/status
- scheduledsparkapplications/finalizers
- sparkconnects/status
verbs:
- get
- update
- patch
{{- if .Values.controller.batchScheduler.enable }}
{{/* required for the `volcano` batch scheduler */}}
- apiGroups:
- scheduling.incubator.k8s.io
- scheduling.sigs.dev
- scheduling.volcano.sh
resources:
- podgroups
verbs:
- "*"
{{- end }}
{{- end -}}

View File

@ -1,206 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "spark-operator.controller.deploymentName" . }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.controller.replicas }}
selector:
matchLabels:
{{- include "spark-operator.controller.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "spark-operator.controller.selectorLabels" . | nindent 8 }}
{{- with .Values.controller.labels }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if or .Values.controller.annotations .Values.prometheus.metrics.enable }}
annotations:
{{- if .Values.prometheus.metrics.enable }}
prometheus.io/scrape: "true"
prometheus.io/port: {{ .Values.prometheus.metrics.port | quote }}
prometheus.io/path: {{ .Values.prometheus.metrics.endpoint }}
{{- end }}
{{- with .Values.controller.annotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
spec:
containers:
- name: spark-operator-controller
image: {{ include "spark-operator.image" . }}
{{- with .Values.image.pullPolicy }}
imagePullPolicy: {{ . }}
{{- end }}
args:
- controller
- start
{{- with .Values.controller.logLevel }}
- --zap-log-level={{ . }}
{{- end }}
{{- with .Values.controller.logEncoder }}
- --zap-encoder={{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if has "" . }}
- --namespaces=""
{{- else }}
- --namespaces={{ . | join "," }}
{{- end }}
{{- end }}
- --controller-threads={{ .Values.controller.workers }}
- --enable-ui-service={{ .Values.controller.uiService.enable }}
{{- if .Values.controller.uiIngress.enable }}
{{- with .Values.controller.uiIngress.urlFormat }}
- --ingress-url-format={{ . }}
{{- end }}
{{- with .Values.controller.uiIngress.ingressClassName }}
- --ingress-class-name={{ . }}
{{- end }}
{{- with .Values.controller.uiIngress.tls }}
- --ingress-tls={{ . | toJson }}
{{- end }}
{{- with .Values.controller.uiIngress.annotations }}
- --ingress-annotations={{ . | toJson }}
{{- end }}
{{- end }}
{{- if .Values.controller.batchScheduler.enable }}
- --enable-batch-scheduler=true
{{- with .Values.controller.batchScheduler.kubeSchedulerNames }}
- --kube-scheduler-names={{ . | join "," }}
{{- end }}
{{- with .Values.controller.batchScheduler.default }}
- --default-batch-scheduler={{ . }}
{{- end }}
{{- end }}
{{- if .Values.prometheus.metrics.enable }}
- --enable-metrics=true
- --metrics-bind-address=:{{ .Values.prometheus.metrics.port }}
- --metrics-endpoint={{ .Values.prometheus.metrics.endpoint }}
- --metrics-prefix={{ .Values.prometheus.metrics.prefix }}
- --metrics-labels=app_type
- --metrics-job-start-latency-buckets={{ .Values.prometheus.metrics.jobStartLatencyBuckets }}
{{- end }}
{{ if .Values.controller.leaderElection.enable }}
- --leader-election=true
- --leader-election-lock-name={{ include "spark-operator.controller.leaderElectionName" . }}
- --leader-election-lock-namespace={{ .Release.Namespace }}
{{- else -}}
- --leader-election=false
{{- end }}
{{- if .Values.controller.pprof.enable }}
- --pprof-bind-address=:{{ .Values.controller.pprof.port }}
{{- end }}
- --workqueue-ratelimiter-bucket-qps={{ .Values.controller.workqueueRateLimiter.bucketQPS }}
- --workqueue-ratelimiter-bucket-size={{ .Values.controller.workqueueRateLimiter.bucketSize }}
{{- if .Values.controller.workqueueRateLimiter.maxDelay.enable }}
- --workqueue-ratelimiter-max-delay={{ .Values.controller.workqueueRateLimiter.maxDelay.duration }}
{{- end }}
{{- if .Values.controller.driverPodCreationGracePeriod }}
- --driver-pod-creation-grace-period={{ .Values.controller.driverPodCreationGracePeriod }}
{{- end }}
{{- if .Values.controller.maxTrackedExecutorPerApp }}
- --max-tracked-executor-per-app={{ .Values.controller.maxTrackedExecutorPerApp }}
{{- end }}
{{- if or .Values.prometheus.metrics.enable .Values.controller.pprof.enable }}
ports:
{{- if .Values.controller.pprof.enable }}
- name: {{ .Values.controller.pprof.portName | quote }}
containerPort: {{ .Values.controller.pprof.port }}
{{- end }}
{{- if .Values.prometheus.metrics.enable }}
- name: {{ .Values.prometheus.metrics.portName | quote }}
containerPort: {{ .Values.prometheus.metrics.port }}
{{- end }}
{{- end }}
{{- with .Values.controller.env }}
env:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.controller.envFrom }}
envFrom:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.controller.volumeMounts }}
volumeMounts:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.controller.resources }}
resources:
{{- toYaml . | nindent 10 }}
{{- end }}
livenessProbe:
httpGet:
port: 8081
scheme: HTTP
path: /healthz
readinessProbe:
httpGet:
port: 8081
scheme: HTTP
path: /readyz
{{- with .Values.controller.securityContext }}
securityContext:
{{- toYaml . | nindent 10 }}
{{- end }}
{{- with .Values.controller.sidecars }}
{{- toYaml . | nindent 6 }}
{{- end }}
{{- with .Values.image.pullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 6 }}
{{- end }}
{{- with .Values.controller.volumes }}
volumes:
{{- toYaml . | nindent 6 }}
{{- end }}
{{- with .Values.controller.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.controller.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.controller.tolerations }}
tolerations:
{{- toYaml . | nindent 6 }}
{{- end }}
{{- with .Values.controller.priorityClassName }}
priorityClassName: {{ . }}
{{- end }}
serviceAccountName: {{ include "spark-operator.controller.serviceAccountName" . }}
automountServiceAccountToken: {{ .Values.controller.serviceAccount.automountServiceAccountToken }}
{{- with .Values.controller.podSecurityContext }}
securityContext:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.controller.topologySpreadConstraints }}
{{- if le (int .Values.controller.replicas) 1 }}
{{- fail "controller.replicas must be greater than 1 to enable topology spread constraints for controller pods"}}
{{- end }}
{{- $selectorLabels := include "spark-operator.controller.selectorLabels" . | fromYaml }}
{{- $labelSelectorDict := dict "labelSelector" ( dict "matchLabels" $selectorLabels ) }}
topologySpreadConstraints:
{{- range .Values.controller.topologySpreadConstraints }}
- {{ mergeOverwrite . $labelSelectorDict | toYaml | nindent 8 | trim }}
{{- end }}
{{- end }}

View File

@ -1,34 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.controller.podDisruptionBudget.enable }}
{{- if le (int .Values.controller.replicas) 1 }}
{{- fail "controller.replicas must be greater than 1 to enable pod disruption budget for controller" }}
{{- end -}}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ include "spark-operator.controller.podDisruptionBudgetName" . }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
spec:
selector:
matchLabels:
{{- include "spark-operator.controller.selectorLabels" . | nindent 6 }}
{{- with .Values.controller.podDisruptionBudget.minAvailable }}
minAvailable: {{ . }}
{{- end }}
{{- end }}

View File

@ -1,164 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.controller.rbac.create -}}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "spark-operator.controller.clusterRoleName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
{{- with .Values.controller.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- get
{{- if not .Values.spark.jobNamespaces | or (has "" .Values.spark.jobNamespaces) }}
{{ include "spark-operator.controller.policyRules" . }}
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "spark-operator.controller.clusterRoleBindingName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
{{- with .Values.controller.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.controller.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "spark-operator.controller.clusterRoleName" . }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "spark-operator.controller.roleName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
{{- with .Values.controller.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
{{- if .Values.controller.leaderElection.enable }}
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- apiGroups:
- coordination.k8s.io
resources:
- leases
resourceNames:
- {{ include "spark-operator.controller.leaderElectionName" . }}
verbs:
- get
- update
{{- end }}
{{- if has .Release.Namespace .Values.spark.jobNamespaces }}
{{ include "spark-operator.controller.policyRules" . }}
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "spark-operator.controller.roleBindingName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
{{- with .Values.controller.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.controller.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "spark-operator.controller.roleName" . }}
{{- if and .Values.spark.jobNamespaces (not (has "" .Values.spark.jobNamespaces)) }}
{{- range $jobNamespace := .Values.spark.jobNamespaces }}
{{- if ne $jobNamespace $.Release.Namespace }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "spark-operator.controller.roleName" $ }}
namespace: {{ $jobNamespace }}
labels:
{{- include "spark-operator.controller.labels" $ | nindent 4 }}
{{- with $.Values.controller.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
{{ include "spark-operator.controller.policyRules" $ }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "spark-operator.controller.roleBindingName" $ }}
namespace: {{ $jobNamespace }}
labels:
{{- include "spark-operator.controller.labels" $ | nindent 4 }}
{{- with $.Values.controller.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.controller.serviceAccountName" $ }}
namespace: {{ $.Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "spark-operator.controller.roleName" $ }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,31 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.controller.pprof.enable }}
apiVersion: v1
kind: Service
metadata:
name: {{ include "spark-operator.controller.serviceName" . }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
spec:
selector:
{{- include "spark-operator.controller.selectorLabels" . | nindent 4 }}
ports:
- port: {{ .Values.controller.pprof.port }}
targetPort: {{ .Values.controller.pprof.portName | quote }}
name: {{ .Values.controller.pprof.portName }}
{{- end }}

View File

@ -1,30 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.controller.serviceAccount.create }}
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: {{ .Values.controller.serviceAccount.automountServiceAccountToken }}
metadata:
name: {{ include "spark-operator.controller.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.controller.labels" . | nindent 4 }}
{{- with .Values.controller.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,110 @@
# If the admission webhook is enabled, then a post-install step is required
# to generate and install the secret in the operator namespace.
# In the post-install hook, the token corresponding to the operator service account
# is used to authenticate with the Kubernetes API server to install the secret bundle.
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "spark-operator.fullname" . }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "spark-operator.selectorLabels" . | nindent 6 }}
strategy:
type: Recreate
template:
metadata:
{{- if or .Values.podAnnotations .Values.metrics.enable }}
annotations:
{{- if .Values.metrics.enable }}
prometheus.io/scrape: "true"
prometheus.io/port: "{{ .Values.metrics.port }}"
prometheus.io/path: {{ .Values.metrics.endpoint }}
{{- end }}
{{- if .Values.podAnnotations }}
{{- toYaml .Values.podAnnotations | trim | nindent 8 }}
{{- end }}
{{- end }}
labels:
{{- include "spark-operator.selectorLabels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- toYaml . | trim | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ include "spark-operator.serviceAccountName" . }}
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
image: {{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
{{- toYaml .Values.securityContext | nindent 10 }}
{{- if .Values.metrics.enable }}
ports:
- name: {{ .Values.metrics.portName | quote }}
containerPort: {{ .Values.metrics.port }}
{{ end }}
args:
- -v={{ .Values.logLevel }}
- -logtostderr
- -namespace={{ .Values.sparkJobNamespace }}
- -enable-ui-service={{ .Values.uiService.enable}}
- -ingress-url-format={{ .Values.ingressUrlFormat }}
- -controller-threads={{ .Values.controllerThreads }}
- -resync-interval={{ .Values.resyncInterval }}
- -enable-batch-scheduler={{ .Values.batchScheduler.enable }}
- -label-selector-filter={{ .Values.labelSelectorFilter }}
{{- if .Values.metrics.enable }}
- -enable-metrics=true
- -metrics-labels=app_type
- -metrics-port={{ .Values.metrics.port }}
- -metrics-endpoint={{ .Values.metrics.endpoint }}
- -metrics-prefix={{ .Values.metrics.prefix }}
{{- end }}
{{- if .Values.webhook.enable }}
- -enable-webhook=true
- -webhook-svc-namespace={{ .Release.Namespace }}
- -webhook-port={{ .Values.webhook.port }}
- -webhook-svc-name={{ include "spark-operator.fullname" . }}-webhook
- -webhook-config-name={{ include "spark-operator.fullname" . }}-webhook-config
- -webhook-namespace-selector={{ .Values.webhook.namespaceSelector }}
{{- end }}
- -enable-resource-quota-enforcement={{ .Values.resourceQuotaEnforcement.enable }}
{{- if gt (int .Values.replicaCount) 1 }}
- -leader-election=true
- -leader-election-lock-namespace={{ default .Release.Namespace .Values.leaderElection.lockNamespace }}
- -leader-election-lock-name={{ .Values.leaderElection.lockName }}
{{- end }}
resources:
{{- toYaml .Values.resources | nindent 10 }}
{{- if .Values.webhook.enable }}
volumeMounts:
- name: webhook-certs
mountPath: /etc/webhook-certs
volumes:
- name: webhook-certs
secret:
secretName: {{ include "spark-operator.fullname" . }}-webhook-certs
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}

View File

@ -0,0 +1,19 @@
{{ if and .Values.metrics.enable .Values.podMonitor.enable }}
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: {{ include "spark-operator.name" . -}}-podmonitor
labels: {{ toYaml .Values.podMonitor.labels | nindent 4 }}
spec:
podMetricsEndpoints:
- interval: {{ .Values.podMonitor.podMetricsEndpoint.interval }}
port: {{ .Values.metrics.portName | quote }}
scheme: {{ .Values.podMonitor.podMetricsEndpoint.scheme }}
jobLabel: {{ .Values.podMonitor.jobLabel }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
selector:
matchLabels:
{{- include "spark-operator.selectorLabels" . | nindent 6 }}
{{ end }}

View File

@ -1,22 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{/*
Create the name of pod monitor
*/}}
{{- define "spark-operator.prometheus.podMonitorName" -}}
{{- include "spark-operator.fullname" . }}-podmonitor
{{- end -}}

View File

@ -1,44 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.prometheus.podMonitor.create -}}
{{- if not .Values.prometheus.metrics.enable }}
{{- fail "`metrics.enable` must be set to true when `podMonitor.create` is true." }}
{{- end }}
{{- if not (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/PodMonitor") }}
{{- fail "The cluster does not support the required API version `monitoring.coreos.com/v1` for `PodMonitor`." }}
{{- end }}
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: {{ include "spark-operator.prometheus.podMonitorName" . }}
{{- with .Values.prometheus.podMonitor.labels }}
labels:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
podMetricsEndpoints:
- interval: {{ .Values.prometheus.podMonitor.podMetricsEndpoint.interval }}
port: {{ .Values.prometheus.metrics.portName | quote }}
scheme: {{ .Values.prometheus.podMonitor.podMetricsEndpoint.scheme }}
jobLabel: {{ .Values.prometheus.podMonitor.jobLabel }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
selector:
matchLabels:
{{- include "spark-operator.selectorLabels" . | nindent 6 }}
{{- end }}

View File

@ -0,0 +1,127 @@
{{- if or .Values.rbac.create .Values.rbac.createClusterRole }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "spark-operator.fullname" . }}
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-delete-policy": hook-failed
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- "*"
- apiGroups:
- ""
resources:
- services
- configmaps
- secrets
verbs:
- create
- get
- delete
- update
- apiGroups:
- extensions
- networking.k8s.io
resources:
- ingresses
verbs:
- create
- get
- delete
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- ""
resources:
- events
verbs:
- create
- update
- patch
- apiGroups:
- ""
resources:
- resourcequotas
verbs:
- get
- list
- watch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- get
- update
- delete
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- create
- get
- update
- delete
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- sparkapplications/status
- scheduledsparkapplications
- scheduledsparkapplications/status
verbs:
- "*"
{{- if .Values.batchScheduler.enable }}
# required for the `volcano` batch scheduler
- apiGroups:
- scheduling.incubator.k8s.io
- scheduling.sigs.dev
- scheduling.volcano.sh
resources:
- podgroups
verbs:
- "*"
{{- end }}
{{ if .Values.webhook.enable }}
- apiGroups:
- batch
resources:
- jobs
verbs:
- delete
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "spark-operator.fullname" . }}
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-delete-policy": hook-failed
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
roleRef:
kind: ClusterRole
name: {{ include "spark-operator.fullname" . }}
apiGroup: rbac.authorization.k8s.io
{{- end }}

View File

@ -0,0 +1,14 @@
{{- if .Values.serviceAccounts.sparkoperator.create }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "spark-operator.serviceAccountName" . }}
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-delete-policy": hook-failed
{{- with .Values.serviceAccounts.sparkoperator.annotations }}
{{ toYaml . | indent 4 }}
{{- end }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
{{- end }}

View File

@ -0,0 +1,45 @@
{{- if or .Values.rbac.create .Values.rbac.createRole }}
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: spark-role
namespace: {{ default .Release.Namespace .Values.sparkJobNamespace }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- "*"
- apiGroups:
- ""
resources:
- services
verbs:
- "*"
- apiGroups:
- ""
resources:
- configmaps
verbs:
- "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: spark
namespace: {{ default .Release.Namespace .Values.sparkJobNamespace }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
subjects:
- kind: ServiceAccount
name: {{ include "spark.serviceAccountName" . }}
namespace: {{ default .Release.Namespace .Values.sparkJobNamespace }}
roleRef:
kind: Role
name: spark-role
apiGroup: rbac.authorization.k8s.io
{{- end }}

View File

@ -0,0 +1,13 @@
{{- if .Values.serviceAccounts.spark.create }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "spark.serviceAccountName" . }}
namespace: {{ default .Release.Namespace .Values.sparkJobNamespace }}
{{- with .Values.serviceAccounts.spark.annotations }}
annotations:
{{ toYaml . | indent 4 }}
{{- end }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
{{- end }}

View File

@ -1,47 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{/*
Create the name of spark component
*/}}
{{- define "spark-operator.spark.name" -}}
{{- include "spark-operator.fullname" . }}-spark
{{- end -}}
{{/*
Create the name of the service account to be used by spark applications
*/}}
{{- define "spark-operator.spark.serviceAccountName" -}}
{{- if .Values.spark.serviceAccount.create -}}
{{- .Values.spark.serviceAccount.name | default (include "spark-operator.spark.name" .) -}}
{{- else -}}
{{- .Values.spark.serviceAccount.name | default "default" -}}
{{- end -}}
{{- end -}}
{{/*
Create the name of the role to be used by spark service account
*/}}
{{- define "spark-operator.spark.roleName" -}}
{{- include "spark-operator.spark.serviceAccountName" . }}
{{- end -}}
{{/*
Create the name of the role binding to be used by spark service account
*/}}
{{- define "spark-operator.spark.roleBindingName" -}}
{{- include "spark-operator.spark.serviceAccountName" . }}
{{- end -}}

View File

@ -1,73 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.spark.rbac.create -}}
{{- range $jobNamespace := .Values.spark.jobNamespaces | default list }}
{{- if ne $jobNamespace "" }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "spark-operator.spark.roleName" $ }}
namespace: {{ $jobNamespace }}
labels:
{{- include "spark-operator.labels" $ | nindent 4 }}
{{- with $.Values.spark.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
- apiGroups:
- ""
resources:
- pods
- configmaps
- persistentvolumeclaims
- services
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- deletecollection
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "spark-operator.spark.roleBindingName" $ }}
namespace: {{ $jobNamespace }}
labels:
{{- include "spark-operator.labels" $ | nindent 4 }}
{{- with $.Values.spark.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.spark.serviceAccountName" $ }}
namespace: {{ $jobNamespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "spark-operator.spark.roleName" $ }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,34 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.spark.serviceAccount.create }}
{{- range $jobNamespace := .Values.spark.jobNamespaces | default list }}
{{- if ne $jobNamespace "" }}
---
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: {{ $.Values.spark.serviceAccount.automountServiceAccountToken }}
metadata:
name: {{ include "spark-operator.spark.serviceAccountName" $ }}
namespace: {{ $jobNamespace }}
labels: {{ include "spark-operator.labels" $ | nindent 4 }}
{{- with $.Values.spark.serviceAccount.annotations }}
annotations: {{ toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,52 @@
{{ if .Values.webhook.enable }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "spark-operator.fullname" . }}-webhook-cleanup
annotations:
{{- toYaml .Values.webhook.cleanupAnnotations | nindent 4 }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
template:
metadata:
name: {{ include "spark-operator.fullname" . }}-webhook-cleanup
{{- if .Values.istio.enabled }}
annotations:
"sidecar.istio.io/inject": "false"
{{- end }}
spec:
serviceAccountName: {{ include "spark-operator.serviceAccountName" . }}
restartPolicy: OnFailure
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: clean-secret
image: {{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
{{- toYaml .Values.securityContext | nindent 10 }}
command:
- "/bin/sh"
- "-c"
- "curl -ik \
-X DELETE \
-H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" \
-H \"Accept: application/json\" \
-H \"Content-Type: application/json\" \
https://kubernetes.default.svc/api/v1/namespaces/{{ .Release.Namespace }}/secrets/{{ include "spark-operator.fullname" . }}-webhook-certs \
&& \
curl -ik \
-X DELETE \
-H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" \
-H \"Accept: application/json\" \
-H \"Content-Type: application/json\" \
--data \"{\\\"kind\\\":\\\"DeleteOptions\\\",\\\"apiVersion\\\":\\\"batch/v1\\\",\\\"propagationPolicy\\\":\\\"Foreground\\\"}\" \
https://kubernetes.default.svc/apis/batch/v1/namespaces/{{ .Release.Namespace }}/jobs/{{ include "spark-operator.fullname" . }}-webhook-init"
{{ end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}

View File

@ -0,0 +1,42 @@
{{ if .Values.webhook.enable }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "spark-operator.fullname" . }}-webhook-init
annotations:
{{- toYaml .Values.webhook.initAnnotations | nindent 4 }}
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
template:
metadata:
name: {{ include "spark-operator.fullname" . }}-webhook-init
{{- if .Values.istio.enabled }}
annotations:
"sidecar.istio.io/inject": "false"
{{- end }}
spec:
serviceAccountName: {{ include "spark-operator.serviceAccountName" . }}
restartPolicy: OnFailure
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: main
image: {{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
{{- toYaml .Values.securityContext | nindent 10 }}
command: [
"/usr/bin/gencerts.sh",
"-n", "{{ .Release.Namespace }}",
"-s", "{{ include "spark-operator.fullname" . }}-webhook",
"-r", "{{ include "spark-operator.fullname" . }}-webhook-certs",
"-p"
]
{{ end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}

View File

@ -0,0 +1,15 @@
{{ if .Values.webhook.enable }}
kind: Service
apiVersion: v1
metadata:
name: {{ include "spark-operator.fullname" . }}-webhook
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
spec:
ports:
- port: 443
targetPort: {{ .Values.webhook.port }}
name: webhook
selector:
{{- include "spark-operator.selectorLabels" . | nindent 4 }}
{{ end }}

View File

@ -1,165 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{/*
Create the name of webhook component
*/}}
{{- define "spark-operator.webhook.name" -}}
{{- include "spark-operator.fullname" . }}-webhook
{{- end -}}
{{/*
Common labels for the webhook
*/}}
{{- define "spark-operator.webhook.labels" -}}
{{ include "spark-operator.labels" . }}
app.kubernetes.io/component: webhook
{{- end -}}
{{/*
Selector labels for the webhook
*/}}
{{- define "spark-operator.webhook.selectorLabels" -}}
{{ include "spark-operator.selectorLabels" . }}
app.kubernetes.io/component: webhook
{{- end -}}
{{/*
Create the name of service account to be used by webhook
*/}}
{{- define "spark-operator.webhook.serviceAccountName" -}}
{{- if .Values.webhook.serviceAccount.create -}}
{{ .Values.webhook.serviceAccount.name | default (include "spark-operator.webhook.name" .) }}
{{- else -}}
{{ .Values.webhook.serviceAccount.name | default "default" }}
{{- end -}}
{{- end -}}
{{/*
Create the name of the cluster role to be used by the webhook
*/}}
{{- define "spark-operator.webhook.clusterRoleName" -}}
{{ include "spark-operator.webhook.name" . }}
{{- end }}
{{/*
Create the name of the cluster role binding to be used by the webhook
*/}}
{{- define "spark-operator.webhook.clusterRoleBindingName" -}}
{{ include "spark-operator.webhook.clusterRoleName" . }}
{{- end }}
{{/*
Create the name of the role to be used by the webhook
*/}}
{{- define "spark-operator.webhook.roleName" -}}
{{ include "spark-operator.webhook.name" . }}
{{- end }}
{{/*
Create the name of the role binding to be used by the webhook
*/}}
{{- define "spark-operator.webhook.roleBindingName" -}}
{{ include "spark-operator.webhook.roleName" . }}
{{- end }}
{{/*
Create the name of the secret to be used by webhook
*/}}
{{- define "spark-operator.webhook.secretName" -}}
{{ include "spark-operator.webhook.name" . }}-certs
{{- end -}}
{{/*
Create the name of the service to be used by webhook
*/}}
{{- define "spark-operator.webhook.serviceName" -}}
{{ include "spark-operator.webhook.name" . }}-svc
{{- end -}}
{{/*
Create the name of mutating webhook configuration
*/}}
{{- define "spark-operator.mutatingWebhookConfigurationName" -}}
webhook.sparkoperator.k8s.io
{{- end -}}
{{/*
Create the name of mutating webhook configuration
*/}}
{{- define "spark-operator.validatingWebhookConfigurationName" -}}
quotaenforcer.sparkoperator.k8s.io
{{- end -}}
{{/*
Create the name of the deployment to be used by webhook
*/}}
{{- define "spark-operator.webhook.deploymentName" -}}
{{ include "spark-operator.webhook.name" . }}
{{- end -}}
{{/*
Create the name of the lease resource to be used by leader election
*/}}
{{- define "spark-operator.webhook.leaderElectionName" -}}
{{ include "spark-operator.webhook.name" . }}-lock
{{- end -}}
{{/*
Create the name of the pod disruption budget to be used by webhook
*/}}
{{- define "spark-operator.webhook.podDisruptionBudgetName" -}}
{{ include "spark-operator.webhook.name" . }}-pdb
{{- end -}}
{{/*
Create the role policy rules for the webhook in every Spark job namespace
*/}}
{{- define "spark-operator.webhook.policyRules" -}}
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- resourcequotas
verbs:
- get
- list
- watch
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- sparkapplications/status
- sparkapplications/finalizers
- scheduledsparkapplications
- scheduledsparkapplications/status
- scheduledsparkapplications/finalizers
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
{{- end -}}

View File

@ -1,170 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.webhook.enable }}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "spark-operator.webhook.deploymentName" . }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.webhook.replicas }}
selector:
matchLabels:
{{- include "spark-operator.webhook.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "spark-operator.webhook.selectorLabels" . | nindent 8 }}
{{- with .Values.webhook.labels }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.annotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
containers:
- name: spark-operator-webhook
image: {{ include "spark-operator.image" . }}
{{- with .Values.image.pullPolicy }}
imagePullPolicy: {{ . }}
{{- end }}
args:
- webhook
- start
{{- with .Values.webhook.logLevel }}
- --zap-log-level={{ . }}
{{- end }}
{{- with .Values.webhook.logEncoder }}
- --zap-encoder={{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if has "" . }}
- --namespaces=""
{{- else }}
- --namespaces={{ . | join "," }}
{{- end }}
{{- end }}
- --webhook-secret-name={{ include "spark-operator.webhook.secretName" . }}
- --webhook-secret-namespace={{ .Release.Namespace }}
- --webhook-svc-name={{ include "spark-operator.webhook.serviceName" . }}
- --webhook-svc-namespace={{ .Release.Namespace }}
- --webhook-port={{ .Values.webhook.port }}
- --mutating-webhook-name={{ include "spark-operator.webhook.name" . }}
- --validating-webhook-name={{ include "spark-operator.webhook.name" . }}
{{- with .Values.webhook.resourceQuotaEnforcement.enable }}
- --enable-resource-quota-enforcement=true
{{- end }}
{{- if .Values.certManager.enable }}
- --enable-cert-manager=true
{{- end }}
{{- if .Values.prometheus.metrics.enable }}
- --enable-metrics=true
- --metrics-bind-address=:{{ .Values.prometheus.metrics.port }}
- --metrics-endpoint={{ .Values.prometheus.metrics.endpoint }}
- --metrics-prefix={{ .Values.prometheus.metrics.prefix }}
- --metrics-labels=app_type
{{- end }}
{{ if .Values.webhook.leaderElection.enable }}
- --leader-election=true
- --leader-election-lock-name={{ include "spark-operator.webhook.leaderElectionName" . }}
- --leader-election-lock-namespace={{ .Release.Namespace }}
{{- else -}}
- --leader-election=false
{{- end }}
ports:
- name: {{ .Values.webhook.portName | quote }}
containerPort: {{ .Values.webhook.port }}
{{- if .Values.prometheus.metrics.enable }}
- name: {{ .Values.prometheus.metrics.portName | quote }}
containerPort: {{ .Values.prometheus.metrics.port }}
{{- end }}
{{- with .Values.webhook.env }}
env:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.envFrom }}
envFrom:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.volumeMounts }}
volumeMounts:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.resources }}
resources:
{{- toYaml . | nindent 10 }}
{{- end }}
livenessProbe:
httpGet:
port: 8081
scheme: HTTP
path: /healthz
readinessProbe:
httpGet:
port: 8081
scheme: HTTP
path: /readyz
{{- with .Values.webhook.securityContext }}
securityContext:
{{- toYaml . | nindent 10 }}
{{- end }}
{{- with .Values.webhook.sidecars }}
{{- toYaml . | nindent 6 }}
{{- end }}
{{- with .Values.image.pullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.volumes }}
volumes:
{{- toYaml . | nindent 6 }}
{{- end }}
{{- with .Values.webhook.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.webhook.priorityClassName }}
priorityClassName: {{ . }}
{{- end }}
serviceAccountName: {{ include "spark-operator.webhook.serviceAccountName" . }}
automountServiceAccountToken: {{ .Values.webhook.serviceAccount.automountServiceAccountToken }}
{{- with .Values.webhook.podSecurityContext }}
securityContext:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.webhook.topologySpreadConstraints }}
{{- if le (int .Values.webhook.replicas) 1 }}
{{- fail "webhook.replicas must be greater than 1 to enable topology spread constraints for webhook pods"}}
{{- end }}
{{- $selectorLabels := include "spark-operator.webhook.selectorLabels" . | fromYaml }}
{{- $labelSelectorDict := dict "labelSelector" ( dict "matchLabels" $selectorLabels ) }}
topologySpreadConstraints:
{{- range .Values.webhook.topologySpreadConstraints }}
- {{ mergeOverwrite . $labelSelectorDict | toYaml | nindent 8 | trim }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,128 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.webhook.enable }}
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: {{ include "spark-operator.webhook.name" . }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- if .Values.certManager.enable }}
annotations:
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "spark-operator.certManager.certificate.name" . }}
{{- end }}
webhooks:
- name: mutate--v1-pod.sparkoperator.k8s.io
admissionReviewVersions: ["v1"]
clientConfig:
service:
name: {{ include "spark-operator.webhook.serviceName" . }}
namespace: {{ .Release.Namespace }}
port: {{ .Values.webhook.port }}
path: /mutate--v1-pod
sideEffects: NoneOnDryRun
{{- with .Values.webhook.failurePolicy }}
failurePolicy: {{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if not (has "" .) }}
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
{{- range $jobNamespace := . }}
- {{ $jobNamespace }}
{{- end }}
{{- end }}
{{- end }}
objectSelector:
matchLabels:
sparkoperator.k8s.io/launched-by-spark-operator: "true"
rules:
- apiGroups: [""]
apiVersions: ["v1"]
resources: ["pods"]
operations: ["CREATE"]
{{- with .Values.webhook.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
- name: mutate-sparkoperator-k8s-io-v1beta2-sparkapplication.sparkoperator.k8s.io
admissionReviewVersions: ["v1"]
clientConfig:
service:
name: {{ include "spark-operator.webhook.serviceName" . }}
namespace: {{ .Release.Namespace }}
port: {{ .Values.webhook.port }}
path: /mutate-sparkoperator-k8s-io-v1beta2-sparkapplication
sideEffects: NoneOnDryRun
{{- with .Values.webhook.failurePolicy }}
failurePolicy: {{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if not (has "" .) }}
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
{{- range $jobNamespace := . }}
- {{ $jobNamespace }}
{{- end }}
{{- end }}
{{- end }}
rules:
- apiGroups: ["sparkoperator.k8s.io"]
apiVersions: ["v1beta2"]
resources: ["sparkapplications"]
operations: ["CREATE", "UPDATE"]
{{- with .Values.webhook.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
- name: mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication.sparkoperator.k8s.io
admissionReviewVersions: ["v1"]
clientConfig:
service:
name: {{ include "spark-operator.webhook.serviceName" . }}
namespace: {{ .Release.Namespace }}
port: {{ .Values.webhook.port }}
path: /mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication
sideEffects: NoneOnDryRun
{{- with .Values.webhook.failurePolicy }}
failurePolicy: {{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if not (has "" .) }}
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
{{- range $jobNamespace := . }}
- {{ $jobNamespace }}
{{- end }}
{{- end }}
{{- end }}
rules:
- apiGroups: ["sparkoperator.k8s.io"]
apiVersions: ["v1beta2"]
resources: ["scheduledsparkapplications"]
operations: ["CREATE", "UPDATE"]
{{- with .Values.webhook.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
{{- end }}

View File

@ -1,36 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.webhook.enable }}
{{- if .Values.webhook.podDisruptionBudget.enable }}
{{- if le (int .Values.webhook.replicas) 1 }}
{{- fail "webhook.replicas must be greater than 1 to enable pod disruption budget for webhook" }}
{{- end -}}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ include "spark-operator.webhook.podDisruptionBudgetName" . }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
spec:
selector:
matchLabels:
{{- include "spark-operator.webhook.selectorLabels" . | nindent 6 }}
{{- with .Values.webhook.podDisruptionBudget.minAvailable }}
minAvailable: {{ . }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,195 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.webhook.enable }}
{{- if .Values.webhook.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "spark-operator.webhook.clusterRoleName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- with .Values.webhook.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- update
- patch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
resourceNames:
- {{ include "spark-operator.webhook.name" . }}
verbs:
- get
- update
{{- if not .Values.spark.jobNamespaces | or (has "" .Values.spark.jobNamespaces) }}
{{ include "spark-operator.webhook.policyRules" . }}
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "spark-operator.webhook.clusterRoleBindingName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- with .Values.webhook.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.webhook.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "spark-operator.webhook.clusterRoleName" . }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "spark-operator.webhook.roleName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- with .Values.webhook.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- apiGroups:
- ""
resources:
- secrets
resourceNames:
- {{ include "spark-operator.webhook.secretName" . }}
verbs:
- get
- update
{{- if .Values.webhook.leaderElection.enable }}
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- apiGroups:
- coordination.k8s.io
resources:
- leases
resourceNames:
- {{ include "spark-operator.webhook.leaderElectionName" . }}
verbs:
- get
- update
{{- end }}
{{- if has .Release.Namespace .Values.spark.jobNamespaces }}
{{ include "spark-operator.webhook.policyRules" . }}
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "spark-operator.webhook.roleBindingName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- with .Values.webhook.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.webhook.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "spark-operator.webhook.roleName" . }}
{{- if and .Values.spark.jobNamespaces (not (has "" .Values.spark.jobNamespaces)) }}
{{- range $jobNamespace := .Values.spark.jobNamespaces }}
{{- if ne $jobNamespace $.Release.Namespace }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "spark-operator.webhook.roleName" $ }}
namespace: {{ $jobNamespace }}
labels:
{{- include "spark-operator.webhook.labels" $ | nindent 4 }}
{{- with $.Values.webhook.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
{{ include "spark-operator.webhook.policyRules" $ }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "spark-operator.webhook.roleBindingName" $ }}
namespace: {{ $jobNamespace }}
labels:
{{- include "spark-operator.webhook.labels" $ | nindent 4 }}
{{- with $.Values.webhook.rbac.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
subjects:
- kind: ServiceAccount
name: {{ include "spark-operator.webhook.serviceAccountName" $ }}
namespace: {{ $.Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "spark-operator.webhook.roleName" $ }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,31 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.webhook.enable }}
apiVersion: v1
kind: Service
metadata:
name: {{ include "spark-operator.webhook.serviceName" . }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
spec:
selector:
{{- include "spark-operator.webhook.selectorLabels" . | nindent 4 }}
ports:
- port: {{ .Values.webhook.port }}
targetPort: {{ .Values.webhook.portName | quote }}
name: {{ .Values.webhook.portName }}
{{- end }}

View File

@ -1,32 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.webhook.enable }}
{{- if .Values.webhook.serviceAccount.create -}}
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: {{ .Values.webhook.serviceAccount.automountServiceAccountToken }}
metadata:
name: {{ include "spark-operator.webhook.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- with .Values.webhook.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -1,93 +0,0 @@
{{/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/}}
{{- if .Values.webhook.enable }}
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: {{ include "spark-operator.webhook.name" . }}
labels:
{{- include "spark-operator.webhook.labels" . | nindent 4 }}
{{- if .Values.certManager.enable }}
annotations:
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "spark-operator.certManager.certificate.name" . }}
{{- end }}
webhooks:
- name: validate-sparkoperator-k8s-io-v1beta2-sparkapplication.sparkoperator.k8s.io
admissionReviewVersions: ["v1"]
clientConfig:
service:
name: {{ include "spark-operator.webhook.serviceName" . }}
namespace: {{ .Release.Namespace }}
port: {{ .Values.webhook.port }}
path: /validate-sparkoperator-k8s-io-v1beta2-sparkapplication
sideEffects: NoneOnDryRun
{{- with .Values.webhook.failurePolicy }}
failurePolicy: {{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if not (has "" .) }}
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
{{- range $jobNamespace := . }}
- {{ $jobNamespace }}
{{- end }}
{{- end }}
{{- end }}
rules:
- apiGroups: ["sparkoperator.k8s.io"]
apiVersions: ["v1beta2"]
resources: ["sparkapplications"]
operations: ["CREATE", "UPDATE"]
{{- with .Values.webhook.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
- name: validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication.sparkoperator.k8s.io
admissionReviewVersions: ["v1"]
clientConfig:
service:
name: {{ include "spark-operator.webhook.serviceName" . }}
namespace: {{ .Release.Namespace }}
port: {{ .Values.webhook.port }}
path: /validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication
sideEffects: NoneOnDryRun
{{- with .Values.webhook.failurePolicy }}
failurePolicy: {{ . }}
{{- end }}
{{- with .Values.spark.jobNamespaces }}
{{- if not (has "" .) }}
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
{{- range $jobNamespace := . }}
- {{ $jobNamespace }}
{{- end }}
{{- end }}
{{- end }}
rules:
- apiGroups: ["sparkoperator.k8s.io"]
apiVersions: ["v1beta2"]
resources: ["scheduledsparkapplications"]
operations: ["CREATE", "UPDATE"]
{{- with .Values.webhook.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
{{- end }}

View File

@ -1,134 +0,0 @@
#
# Copyright 2025 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test CertManager Certificate
templates:
- certmanager/certificate.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create Certificate if `webhook.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: false
certManager:
enable: true
asserts:
- hasDocuments:
count: 0
- it: Should not create Certificate if `certManager.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should create Certificate if `webhook.enable` is `true` and `certManager.enable` is `true`
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
asserts:
- containsDocument:
apiVersion: cert-manager.io/v1
kind: Certificate
name: spark-operator-certificate
namespace: spark-operator
- it: Should fail if the cluster does not support `cert-manager.io/v1/Certificate`
set:
webhook:
enable: true
certManager:
enable: true
asserts:
- failedTemplate:
errorMessage: "The cluster does not support the required API version `cert-manager.io/v1` for `Certificate`."
- it: Should use self signed issuer if `certManager.issuerRef` is not set
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
issuerRef:
group: cert-manager.io
kind: Issuer
name: test-issuer
asserts:
- equal:
path: spec.issuerRef
value:
group: cert-manager.io
kind: Issuer
name: test-issuer
- it: Should use the specified issuer if `certManager.issuerRef` is set
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
issuerRef:
group: cert-manager.io
kind: Issuer
name: test-issuer
asserts:
- equal:
path: spec.issuerRef
value:
group: cert-manager.io
kind: Issuer
name: test-issuer
- it: Should use the specified duration if `certManager.duration` is set
capabilities:
apiVersions:
- cert-manager.io/v1/Certificate
set:
webhook:
enable: true
certManager:
enable: true
duration: 8760h
asserts:
- equal:
path: spec.duration
value: 8760h

View File

@ -1,95 +0,0 @@
#
# Copyright 2025 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test CertManager Issuer
templates:
- certmanager/issuer.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create Issuer if `webhook.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: false
certManager:
enable: true
asserts:
- hasDocuments:
count: 0
- it: Should not create Issuer if `certManager.enable` is `false`
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: true
certManager:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should not create Issuer if `certManager.issuerRef` is set
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: true
certManager:
enable: true
issuerRef:
group: cert-manager.io
kind: Issuer
name: test-issuer
asserts:
- hasDocuments:
count: 0
- it: Should fail if the cluster does not support `cert-manager.io/v1/Issuer`
set:
webhook:
enable: true
certManager:
enable: true
asserts:
- failedTemplate:
errorMessage: "The cluster does not support the required API version `cert-manager.io/v1` for `Issuer`."
- it: Should create Issuer if `webhook.enable` is `true` and `certManager.enable` is `true`
capabilities:
apiVersions:
- cert-manager.io/v1/Issuer
set:
webhook:
enable: true
certManager:
enable: true
issuerRef: null
asserts:
- containsDocument:
apiVersion: cert-manager.io/v1
kind: Issuer
name: spark-operator-self-signed-issuer
namespace: spark-operator

View File

@ -1,729 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test controller deployment
templates:
- controller/deployment.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should use the specified image repository if `image.registry`, `image.repository` and `image.tag` are set
set:
image:
registry: test-registry
repository: test-repository
tag: test-tag
asserts:
- equal:
path: spec.template.spec.containers[0].image
value: test-registry/test-repository:test-tag
- it: Should use the specified image pull policy if `image.pullPolicy` is set
set:
image:
pullPolicy: Always
asserts:
- equal:
path: spec.template.spec.containers[*].imagePullPolicy
value: Always
- it: Should set replicas if `controller.replicas` is set
set:
controller:
replicas: 10
asserts:
- equal:
path: spec.replicas
value: 10
- it: Should set replicas if `controller.replicas` is set
set:
controller:
replicas: 0
asserts:
- equal:
path: spec.replicas
value: 0
- it: Should add pod labels if `controller.labels` is set
set:
controller:
labels:
key1: value1
key2: value2
asserts:
- equal:
path: spec.template.metadata.labels.key1
value: value1
- equal:
path: spec.template.metadata.labels.key2
value: value2
- it: Should add prometheus annotations if `metrics.enable` is true
set:
prometheus:
metrics:
enable: true
port: 10254
endpoint: /metrics
asserts:
- equal:
path: spec.template.metadata.annotations["prometheus.io/scrape"]
value: "true"
- equal:
path: spec.template.metadata.annotations["prometheus.io/port"]
value: "10254"
- equal:
path: spec.template.metadata.annotations["prometheus.io/path"]
value: /metrics
- it: Should add pod annotations if `controller.annotations` is set
set:
controller:
annotations:
key1: value1
key2: value2
asserts:
- equal:
path: spec.template.metadata.annotations.key1
value: value1
- equal:
path: spec.template.metadata.annotations.key2
value: value2
- it: Should contain `--zap-log-level` arg if `controller.logLevel` is set
set:
controller:
logLevel: debug
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --zap-log-level=debug
- it: Should contain `--namespaces` arg if `spark.jobNamespaces` is set
set:
spark:
jobNamespaces:
- ns1
- ns2
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --namespaces=ns1,ns2
- it: Should set namespaces to all namespaces (`""`) if `spark.jobNamespaces` contains empty string
set:
spark:
jobNamespaces:
- ""
- default
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --namespaces=""
- it: Should contain `--controller-threads` arg if `controller.workers` is set
set:
controller:
workers: 30
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --controller-threads=30
- it: Should contain `--enable-ui-service` arg if `controller.uiService.enable` is set to `true`
set:
controller:
uiService:
enable: true
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --enable-ui-service=true
- it: Should contain `--ingress-url-format` arg if `controller.uiIngress.enable` is set to `true` and `controller.uiIngress.urlFormat` is set
set:
controller:
uiService:
enable: true
uiIngress:
enable: true
urlFormat: "{{$appName}}.example.com/{{$appNamespace}}/{{$appName}}"
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --ingress-url-format={{$appName}}.example.com/{{$appNamespace}}/{{$appName}}
- it: Should contain `--ingress-class-name` arg if `controller.uiIngress.enable` is set to `true` and `controller.uiIngress.ingressClassName` is set
set:
controller:
uiService:
enable: true
uiIngress:
enable: true
ingressClassName: nginx
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --ingress-class-name=nginx
- it: Should contain `--ingress-tls` arg if `controller.uiIngress.enable` is set to `true` and `controller.uiIngress.tls` is set
set:
controller:
uiService:
enable: true
uiIngress:
enable: true
tls:
- hosts:
- "*.test.com"
secretName: test-secret
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: '--ingress-tls=[{"hosts":["*.test.com"],"secretName":"test-secret"}]'
- it: Should contain `--ingress-annotations` arg if `controller.uiIngress.enable` is set to `true` and `controller.uiIngress.annotations` is set
set:
controller:
uiService:
enable: true
uiIngress:
enable: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt"
kubernetes.io/ingress.class: nginx
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: '--ingress-annotations={"cert-manager.io/cluster-issuer":"letsencrypt","kubernetes.io/ingress.class":"nginx"}'
- it: Should contain `--enable-batch-scheduler` arg if `controller.batchScheduler.enable` is `true`
set:
controller:
batchScheduler:
enable: true
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --enable-batch-scheduler=true
- it: Should contain `--default-batch-scheduler` arg if `controller.batchScheduler.default` is set
set:
controller:
batchScheduler:
enable: true
default: yunikorn
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --default-batch-scheduler=yunikorn
- it: Should contain `--enable-metrics` arg if `prometheus.metrics.enable` is set to `true`
set:
prometheus:
metrics:
enable: true
port: 12345
portName: test-port
endpoint: /test-endpoint
prefix: test-prefix
jobStartLatencyBuckets: "180,360,420,690"
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --enable-metrics=true
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --metrics-bind-address=:12345
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --metrics-endpoint=/test-endpoint
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --metrics-prefix=test-prefix
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --metrics-labels=app_type
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --metrics-job-start-latency-buckets=180,360,420,690
- it: Should enable leader election by default
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --leader-election=true
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --leader-election-lock-name=spark-operator-controller-lock
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --leader-election-lock-namespace=spark-operator
- it: Should disable leader election if `controller.leaderElection.enable` is set to `false`
set:
controller:
leaderElection:
enable: false
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --leader-election=false
- it: Should add metric ports if `prometheus.metrics.enable` is true
set:
prometheus:
metrics:
enable: true
port: 10254
portName: metrics
asserts:
- contains:
path: spec.template.spec.containers[0].ports
content:
name: metrics
containerPort: 10254
count: 1
- it: Should add environment variables if `controller.env` is set
set:
controller:
env:
- name: ENV_NAME_1
value: ENV_VALUE_1
- name: ENV_NAME_2
valueFrom:
configMapKeyRef:
name: test-configmap
key: test-key
optional: false
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: ENV_NAME_1
value: ENV_VALUE_1
- contains:
path: spec.template.spec.containers[0].env
content:
name: ENV_NAME_2
valueFrom:
configMapKeyRef:
name: test-configmap
key: test-key
optional: false
- it: Should add environment variable sources if `controller.envFrom` is set
set:
controller:
envFrom:
- configMapRef:
name: test-configmap
optional: false
- secretRef:
name: test-secret
optional: false
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: test-configmap
optional: false
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: test-secret
optional: false
- it: Should add volume mounts if `controller.volumeMounts` is set
set:
controller:
volumeMounts:
- name: volume1
mountPath: /volume1
- name: volume2
mountPath: /volume2
asserts:
- contains:
path: spec.template.spec.containers[0].volumeMounts
content:
name: volume1
mountPath: /volume1
- contains:
path: spec.template.spec.containers[0].volumeMounts
content:
name: volume2
mountPath: /volume2
- it: Should add resources if `controller.resources` is set
set:
controller:
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
asserts:
- equal:
path: spec.template.spec.containers[0].resources
value:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- it: Should add container securityContext if `controller.securityContext` is set
set:
controller:
securityContext:
readOnlyRootFilesystem: true
runAsUser: 1000
runAsGroup: 2000
fsGroup: 3000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
privileged: false
asserts:
- equal:
path: spec.template.spec.containers[0].securityContext.readOnlyRootFilesystem
value: true
- equal:
path: spec.template.spec.containers[0].securityContext.runAsUser
value: 1000
- equal:
path: spec.template.spec.containers[0].securityContext.runAsGroup
value: 2000
- equal:
path: spec.template.spec.containers[0].securityContext.fsGroup
value: 3000
- equal:
path: spec.template.spec.containers[0].securityContext.allowPrivilegeEscalation
value: false
- equal:
path: spec.template.spec.containers[0].securityContext.capabilities
value:
drop:
- ALL
- equal:
path: spec.template.spec.containers[0].securityContext.runAsNonRoot
value: true
- equal:
path: spec.template.spec.containers[0].securityContext.privileged
value: false
- it: Should add sidecars if `controller.sidecars` is set
set:
controller:
sidecars:
- name: sidecar1
image: sidecar-image1
- name: sidecar2
image: sidecar-image2
asserts:
- contains:
path: spec.template.spec.containers
content:
name: sidecar1
image: sidecar-image1
- contains:
path: spec.template.spec.containers
content:
name: sidecar2
image: sidecar-image2
- it: Should add secrets if `image.pullSecrets` is set
set:
image:
pullSecrets:
- name: test-secret1
- name: test-secret2
asserts:
- equal:
path: spec.template.spec.imagePullSecrets[0].name
value: test-secret1
- equal:
path: spec.template.spec.imagePullSecrets[1].name
value: test-secret2
- it: Should add volumes if `controller.volumes` is set
set:
controller:
volumes:
- name: volume1
emptyDir: {}
- name: volume2
emptyDir: {}
asserts:
- contains:
path: spec.template.spec.volumes
content:
name: volume1
emptyDir: {}
count: 1
- contains:
path: spec.template.spec.volumes
content:
name: volume2
emptyDir: {}
count: 1
- it: Should add nodeSelector if `controller.nodeSelector` is set
set:
controller:
nodeSelector:
key1: value1
key2: value2
asserts:
- equal:
path: spec.template.spec.nodeSelector.key1
value: value1
- equal:
path: spec.template.spec.nodeSelector.key2
value: value2
- it: Should add affinity if `controller.affinity` is set
set:
controller:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- antarctica-east1
- antarctica-west1
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
asserts:
- equal:
path: spec.template.spec.affinity
value:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- antarctica-east1
- antarctica-west1
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
- it: Should add tolerations if `controller.tolerations` is set
set:
controller:
tolerations:
- key: key1
operator: Equal
value: value1
effect: NoSchedule
- key: key2
operator: Exists
effect: NoSchedule
asserts:
- equal:
path: spec.template.spec.tolerations
value:
- key: key1
operator: Equal
value: value1
effect: NoSchedule
- key: key2
operator: Exists
effect: NoSchedule
- it: Should add priorityClassName if `controller.priorityClassName` is set
set:
controller:
priorityClassName: test-priority-class
asserts:
- equal:
path: spec.template.spec.priorityClassName
value: test-priority-class
- it: Should add pod securityContext if `controller.podSecurityContext` is set
set:
controller:
podSecurityContext:
runAsUser: 1000
runAsGroup: 2000
fsGroup: 3000
asserts:
- equal:
path: spec.template.spec.securityContext
value:
runAsUser: 1000
runAsGroup: 2000
fsGroup: 3000
- it: Should not contain topologySpreadConstraints if `controller.topologySpreadConstraints` is not set
set:
controller:
topologySpreadConstraints: []
asserts:
- notExists:
path: spec.template.spec.topologySpreadConstraints
- it: Should add topologySpreadConstraints if `controller.topologySpreadConstraints` is set and `controller.replicas` is greater than 1
set:
controller:
replicas: 2
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
asserts:
- equal:
path: spec.template.spec.topologySpreadConstraints
value:
- labelSelector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: spark-operator
app.kubernetes.io/name: spark-operator
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- labelSelector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: spark-operator
app.kubernetes.io/name: spark-operator
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
- it: Should fail if `controller.topologySpreadConstraints` is set and `controller.replicas` is not greater than 1
set:
controller:
replicas: 1
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
asserts:
- failedTemplate:
errorMessage: "controller.replicas must be greater than 1 to enable topology spread constraints for controller pods"
- it: Should contain `--pprof-bind-address` arg if `controller.pprof.enable` is set to `true`
set:
controller:
pprof:
enable: true
port: 12345
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --pprof-bind-address=:12345
- it: Should add pprof ports if `controller.pprof.enable` is set to `true`
set:
controller:
pprof:
enable: true
port: 12345
portName: pprof-test
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].ports
content:
name: pprof-test
containerPort: 12345
count: 1
- it: Should contain `--workqueue-ratelimiter-max-delay` arg if `controller.workqueueRateLimiter.maxDelay.enable` is set to `true`
set:
controller:
workqueueRateLimiter:
bucketQPS: 1
bucketSize: 2
maxDelay:
enable: true
duration: 3h
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --workqueue-ratelimiter-bucket-qps=1
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --workqueue-ratelimiter-bucket-size=2
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --workqueue-ratelimiter-max-delay=3h
- it: Should contain `--workqueue-ratelimiter-max-delay` arg if `controller.workqueueRateLimiter.maxDelay.enable` is set to `true`
set:
controller:
maxDelay:
enable: false
duration: 1h
asserts:
- notContains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --workqueue-ratelimiter-max-delay=1h
- it: Should contain `driver-pod-creation-grace-period` arg if `controller.driverPodCreationGracePeriod` is set
set:
controller:
driverPodCreationGracePeriod: 30s
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --driver-pod-creation-grace-period=30s
- it: Should contain `--max-tracked-executor-per-app` arg if `controller.maxTrackedExecutorPerApp` is set
set:
controller:
maxTrackedExecutorPerApp: 123
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-controller")].args
content: --max-tracked-executor-per-app=123

View File

@ -1,68 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test controller pod disruption budget
templates:
- controller/poddisruptionbudget.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not render podDisruptionBudget if `controller.podDisruptionBudget.enable` is false
set:
controller:
podDisruptionBudget:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should fail if `controller.replicas` is less than 2 when `controller.podDisruptionBudget.enable` is true
set:
controller:
replicas: 1
podDisruptionBudget:
enable: true
asserts:
- failedTemplate:
errorMessage: "controller.replicas must be greater than 1 to enable pod disruption budget for controller"
- it: Should render spark operator podDisruptionBudget if `controller.podDisruptionBudget.enable` is true
set:
controller:
replicas: 2
podDisruptionBudget:
enable: true
asserts:
- containsDocument:
apiVersion: policy/v1
kind: PodDisruptionBudget
name: spark-operator-controller-pdb
- it: Should set minAvailable if `controller.podDisruptionBudget.minAvailable` is specified
set:
controller:
replicas: 2
podDisruptionBudget:
enable: true
minAvailable: 3
asserts:
- equal:
path: spec.minAvailable
value: 3

View File

@ -1,165 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test controller rbac
templates:
- controller/rbac.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create controller RBAC resources if `controller.rbac.create` is false
set:
controller:
rbac:
create: false
asserts:
- hasDocuments:
count: 0
- it: Should create controller ClusterRole by default
documentIndex: 0
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
name: spark-operator-controller
- it: Should create controller ClusterRoleBinding by default
documentIndex: 1
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
name: spark-operator-controller
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark-operator-controller
namespace: spark-operator
count: 1
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: spark-operator-controller
- it: Should add extra annotations to controller ClusterRole if `controller.rbac.annotations` is set
set:
controller:
rbac:
annotations:
key1: value1
key2: value2
asserts:
- equal:
path: metadata.annotations.key1
value: value1
- equal:
path: metadata.annotations.key2
value: value2
- it: Should create role and rolebinding for controller in release namespace
documentIndex: 2
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-controller
namespace: spark-operator
- it: Should create role and rolebinding for controller in release namespace
documentIndex: 3
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-controller
namespace: spark-operator
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark-operator-controller
namespace: spark-operator
count: 1
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark-operator-controller
- it: Should create roles and rolebindings for controller in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 4
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-controller
namespace: default
- it: Should create roles and rolebindings for controller in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 5
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-controller
namespace: default
- it: Should create roles and rolebindings for controller in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 6
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-controller
namespace: spark
- it: Should create roles and rolebindings for controller in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 7
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-controller
namespace: spark

View File

@ -1,44 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test controller deployment
templates:
- controller/service.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should create the pprof service correctly
set:
controller:
pprof:
enable: true
port: 12345
portName: pprof-test
asserts:
- containsDocument:
apiVersion: v1
kind: Service
name: spark-operator-controller-svc
- equal:
path: spec.ports[0]
value:
port: 12345
targetPort: pprof-test
name: pprof-test

View File

@ -1,67 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test controller service account
templates:
- controller/serviceaccount.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create controller service account if `controller.serviceAccount.create` is false
set:
controller:
serviceAccount:
create: false
asserts:
- hasDocuments:
count: 0
- it: Should create controller service account by default
asserts:
- containsDocument:
apiVersion: v1
kind: ServiceAccount
name: spark-operator-controller
- it: Should use the specified service account name if `controller.serviceAccount.name` is set
set:
controller:
serviceAccount:
name: custom-service-account
asserts:
- containsDocument:
apiVersion: v1
kind: ServiceAccount
name: custom-service-account
- it: Should add extra annotations if `controller.serviceAccount.annotations` is set
set:
controller:
serviceAccount:
annotations:
key1: value1
key2: value2
asserts:
- equal:
path: metadata.annotations.key1
value: value1
- equal:
path: metadata.annotations.key2
value: value2

View File

@ -1,102 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test prometheus pod monitor
templates:
- prometheus/podmonitor.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create pod monitor by default
asserts:
- hasDocuments:
count: 0
- it: Should fail if `prometheus.podMonitor.create` is true and `prometheus.metrics.enable` is false
set:
prometheus:
metrics:
enable: false
podMonitor:
create: true
asserts:
- failedTemplate:
errorMessage: "`metrics.enable` must be set to true when `podMonitor.create` is true."
- it: Should fail if the cluster does not support `monitoring.coreos.com/v1/PodMonitor` even if`prometheus.podMonitor.create` and `prometheus.metrics.enable` are both true
set:
prometheus:
metrics:
enable: true
podMonitor:
create: true
asserts:
- failedTemplate:
errorMessage: "The cluster does not support the required API version `monitoring.coreos.com/v1` for `PodMonitor`."
- it: Should create pod monitor if the cluster support `monitoring.coreos.com/v1/PodMonitor` and `prometheus.podMonitor.create` and `prometheus.metrics.enable` are both true
capabilities:
apiVersions:
- monitoring.coreos.com/v1/PodMonitor
set:
prometheus:
metrics:
enable: true
podMonitor:
create: true
asserts:
- containsDocument:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
name: spark-operator-podmonitor
- it: Should use the specified labels, jobLabel and podMetricsEndpoint
capabilities:
apiVersions:
- monitoring.coreos.com/v1/PodMonitor
set:
prometheus:
metrics:
enable: true
portName: custom-port
podMonitor:
create: true
labels:
key1: value1
key2: value2
jobLabel: custom-job-label
podMetricsEndpoint:
scheme: https
interval: 10s
asserts:
- equal:
path: metadata.labels
value:
key1: value1
key2: value2
- equal:
path: spec.podMetricsEndpoints[0]
value:
port: custom-port
scheme: https
interval: 10s
- equal:
path: spec.jobLabel
value: custom-job-label

View File

@ -1,182 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test Spark RBAC
templates:
- spark/rbac.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create RBAC resources for Spark if `spark.rbac.create` is false
set:
spark:
rbac:
create: false
asserts:
- hasDocuments:
count: 0
- it: Should create RBAC resources for Spark in namespace `default` by default
documentIndex: 0
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-spark
namespace: default
- it: Should create RBAC resources for Spark in namespace `default` by default
documentIndex: 1
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-spark
namespace: default
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark-operator-spark
namespace: default
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark-operator-spark
- it: Should create RBAC resources for Spark in every Spark job namespace
set:
spark:
jobNamespaces:
- ns1
- ns2
documentIndex: 0
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-spark
namespace: ns1
- it: Should create RBAC resources for Spark in every Spark job namespace
set:
spark:
jobNamespaces:
- ns1
- ns2
documentIndex: 1
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-spark
namespace: ns1
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark-operator-spark
namespace: ns1
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark-operator-spark
- it: Should create RBAC resources for Spark in every Spark job namespace
set:
spark:
jobNamespaces:
- ns1
- ns2
documentIndex: 2
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-spark
namespace: ns2
- it: Should create RBAC resources for Spark in every Spark job namespace
set:
spark:
jobNamespaces:
- ns1
- ns2
documentIndex: 3
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-spark
namespace: ns2
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark-operator-spark
namespace: ns2
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark-operator-spark
- it: Should use the specified service account name if `spark.serviceAccount.name` is set
set:
spark:
serviceAccount:
name: spark
documentIndex: 0
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark
namespace: default
- it: Should use the specified service account name if `spark.serviceAccount.name` is set
set:
spark:
serviceAccount:
name: spark
documentIndex: 1
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark
namespace: default
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark
namespace: default
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark

View File

@ -1,101 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test spark service account
templates:
- spark/serviceaccount.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create service account if `spark.serviceAccount.create` is false
set:
spark:
serviceAccount:
create: false
asserts:
- hasDocuments:
count: 0
- it: Should create service account by default
asserts:
- containsDocument:
apiVersion: v1
kind: ServiceAccount
name: spark-operator-spark
- it: Should use the specified service account name if `spark.serviceAccount.name` is set
set:
spark:
serviceAccount:
name: spark
asserts:
- containsDocument:
apiVersion: v1
kind: ServiceAccount
name: spark
- it: Should add extra annotations if `spark.serviceAccount.annotations` is set
set:
spark:
serviceAccount:
annotations:
key1: value1
key2: value2
asserts:
- equal:
path: metadata.annotations.key1
value: value1
- equal:
path: metadata.annotations.key2
value: value2
- it: Should create service account for every non-empty spark job namespace if `spark.jobNamespaces` is set with multiple values
set:
spark:
jobNamespaces:
- ""
- ns1
- ns2
documentIndex: 0
asserts:
- hasDocuments:
count: 2
- containsDocument:
apiVersion: v1
kind: ServiceAccount
name: spark-operator-spark
namespace: ns1
- it: Should create service account for every non-empty spark job namespace if `spark.jobNamespaces` is set with multiple values
set:
spark:
jobNamespaces:
- ""
- ns1
- ns2
documentIndex: 1
asserts:
- hasDocuments:
count: 2
- containsDocument:
apiVersion: v1
kind: ServiceAccount
name: spark-operator-spark
namespace: ns2

View File

@ -1,566 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test webhook deployment
templates:
- webhook/deployment.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should create webhook deployment by default
asserts:
- containsDocument:
apiVersion: apps/v1
kind: Deployment
name: spark-operator-webhook
- it: Should not create webhook deployment if `webhook.enable` is `false`
set:
webhook:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should set replicas if `webhook.replicas` is set
set:
webhook:
replicas: 10
asserts:
- equal:
path: spec.replicas
value: 10
- it: Should set replicas if `webhook.replicas` is set
set:
webhook:
replicas: 0
asserts:
- equal:
path: spec.replicas
value: 0
- it: Should add pod labels if `webhook.labels` is set
set:
webhook:
labels:
key1: value1
key2: value2
asserts:
- equal:
path: spec.template.metadata.labels.key1
value: value1
- equal:
path: spec.template.metadata.labels.key2
value: value2
- it: Should add pod annotations if `webhook.annotations` is set
set:
webhook:
annotations:
key1: value1
key2: value2
asserts:
- equal:
path: spec.template.metadata.annotations.key1
value: value1
- equal:
path: spec.template.metadata.annotations.key2
value: value2
- it: Should use the specified image repository if `image.registry`, `image.repository` and `image.tag` are set
set:
image:
registry: test-registry
repository: test-repository
tag: test-tag
asserts:
- equal:
path: spec.template.spec.containers[0].image
value: test-registry/test-repository:test-tag
- it: Should use the specified image pull policy if `image.pullPolicy` is set
set:
image:
pullPolicy: Always
asserts:
- equal:
path: spec.template.spec.containers[0].imagePullPolicy
value: Always
- it: Should contain `--zap-log-level` arg if `webhook.logLevel` is set
set:
webhook:
logLevel: debug
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --zap-log-level=debug
- it: Should contain `--namespaces` arg if `spark.jobNamespaces` is set
set:
spark.jobNamespaces:
- ns1
- ns2
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --namespaces=ns1,ns2
- it: Should set namespaces to all namespaces (`""`) if `spark.jobNamespaces` contains empty string
set:
spark:
jobNamespaces:
- ""
- default
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --namespaces=""
- it: Should contain `--enable-metrics` arg if `prometheus.metrics.enable` is set to `true`
set:
prometheus:
metrics:
enable: true
port: 12345
portName: test-port
endpoint: /test-endpoint
prefix: test-prefix
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --enable-metrics=true
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --metrics-bind-address=:12345
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --metrics-endpoint=/test-endpoint
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --metrics-prefix=test-prefix
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --metrics-labels=app_type
- it: Should enable leader election by default
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --leader-election=true
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --leader-election-lock-name=spark-operator-webhook-lock
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --leader-election-lock-namespace=spark-operator
- it: Should disable leader election if `webhook.leaderElection.enable` is set to `false`
set:
webhook:
leaderElection:
enable: false
asserts:
- contains:
path: spec.template.spec.containers[?(@.name=="spark-operator-webhook")].args
content: --leader-election=false
- it: Should add webhook port
set:
webhook:
port: 12345
portName: test-port
asserts:
- contains:
path: spec.template.spec.containers[0].ports
content:
name: test-port
containerPort: 12345
- it: Should add metric port if `prometheus.metrics.enable` is true
set:
prometheus:
metrics:
enable: true
port: 10254
portName: metrics
asserts:
- contains:
path: spec.template.spec.containers[0].ports
content:
name: metrics
containerPort: 10254
count: 1
- it: Should add environment variables if `webhook.env` is set
set:
webhook:
env:
- name: ENV_NAME_1
value: ENV_VALUE_1
- name: ENV_NAME_2
valueFrom:
configMapKeyRef:
name: test-configmap
key: test-key
optional: false
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: ENV_NAME_1
value: ENV_VALUE_1
- contains:
path: spec.template.spec.containers[0].env
content:
name: ENV_NAME_2
valueFrom:
configMapKeyRef:
name: test-configmap
key: test-key
optional: false
- it: Should add environment variable sources if `webhook.envFrom` is set
set:
webhook:
envFrom:
- configMapRef:
name: test-configmap
optional: false
- secretRef:
name: test-secret
optional: false
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: test-configmap
optional: false
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: test-secret
optional: false
- it: Should add volume mounts if `webhook.volumeMounts` is set
set:
webhook:
volumeMounts:
- name: volume1
mountPath: /volume1
- name: volume2
mountPath: /volume2
asserts:
- contains:
path: spec.template.spec.containers[0].volumeMounts
content:
name: volume1
mountPath: /volume1
count: 1
- contains:
path: spec.template.spec.containers[0].volumeMounts
content:
name: volume2
mountPath: /volume2
count: 1
- it: Should add resources if `webhook.resources` is set
set:
webhook:
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
asserts:
- equal:
path: spec.template.spec.containers[0].resources
value:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- it: Should add container securityContext if `webhook.securityContext` is set
set:
webhook:
securityContext:
readOnlyRootFilesystem: true
runAsUser: 1000
runAsGroup: 2000
fsGroup: 3000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
privileged: false
asserts:
- equal:
path: spec.template.spec.containers[0].securityContext.readOnlyRootFilesystem
value: true
- equal:
path: spec.template.spec.containers[0].securityContext.runAsUser
value: 1000
- equal:
path: spec.template.spec.containers[0].securityContext.runAsGroup
value: 2000
- equal:
path: spec.template.spec.containers[0].securityContext.fsGroup
value: 3000
- equal:
path: spec.template.spec.containers[0].securityContext.allowPrivilegeEscalation
value: false
- equal:
path: spec.template.spec.containers[0].securityContext.capabilities
value:
drop:
- ALL
- equal:
path: spec.template.spec.containers[0].securityContext.runAsNonRoot
value: true
- equal:
path: spec.template.spec.containers[0].securityContext.privileged
value: false
- it: Should add sidecars if `webhook.sidecars` is set
set:
webhook:
sidecars:
- name: sidecar1
image: sidecar-image1
- name: sidecar2
image: sidecar-image2
asserts:
- contains:
path: spec.template.spec.containers
content:
name: sidecar1
image: sidecar-image1
- contains:
path: spec.template.spec.containers
content:
name: sidecar2
image: sidecar-image2
- it: Should add secrets if `image.pullSecrets` is set
set:
image:
pullSecrets:
- name: test-secret1
- name: test-secret2
asserts:
- equal:
path: spec.template.spec.imagePullSecrets[0].name
value: test-secret1
- equal:
path: spec.template.spec.imagePullSecrets[1].name
value: test-secret2
- it: Should add volumes if `webhook.volumes` is set
set:
webhook:
volumes:
- name: volume1
emptyDir: {}
- name: volume2
emptyDir: {}
asserts:
- contains:
path: spec.template.spec.volumes
content:
name: volume1
emptyDir: {}
count: 1
- contains:
path: spec.template.spec.volumes
content:
name: volume2
emptyDir: {}
count: 1
- it: Should add nodeSelector if `webhook.nodeSelector` is set
set:
webhook:
nodeSelector:
key1: value1
key2: value2
asserts:
- equal:
path: spec.template.spec.nodeSelector.key1
value: value1
- equal:
path: spec.template.spec.nodeSelector.key2
value: value2
- it: Should add affinity if `webhook.affinity` is set
set:
webhook:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- antarctica-east1
- antarctica-west1
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
asserts:
- equal:
path: spec.template.spec.affinity
value:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- antarctica-east1
- antarctica-west1
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
- it: Should add tolerations if `webhook.tolerations` is set
set:
webhook:
tolerations:
- key: key1
operator: Equal
value: value1
effect: NoSchedule
- key: key2
operator: Exists
effect: NoSchedule
asserts:
- equal:
path: spec.template.spec.tolerations
value:
- key: key1
operator: Equal
value: value1
effect: NoSchedule
- key: key2
operator: Exists
effect: NoSchedule
- it: Should add priorityClassName if `webhook.priorityClassName` is set
set:
webhook:
priorityClassName: test-priority-class
asserts:
- equal:
path: spec.template.spec.priorityClassName
value: test-priority-class
- it: Should add pod securityContext if `webhook.podSecurityContext` is set
set:
webhook:
podSecurityContext:
runAsUser: 1000
runAsGroup: 2000
fsGroup: 3000
asserts:
- equal:
path: spec.template.spec.securityContext.runAsUser
value: 1000
- equal:
path: spec.template.spec.securityContext.runAsGroup
value: 2000
- equal:
path: spec.template.spec.securityContext.fsGroup
value: 3000
- it: Should not contain topologySpreadConstraints if `webhook.topologySpreadConstraints` is not set
set:
webhook:
topologySpreadConstraints: []
asserts:
- notExists:
path: spec.template.spec.topologySpreadConstraints
- it: Should add topologySpreadConstraints if `webhook.topologySpreadConstraints` is set and `webhook.replicas` is greater than 1
set:
webhook:
replicas: 2
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
asserts:
- equal:
path: spec.template.spec.topologySpreadConstraints
value:
- labelSelector:
matchLabels:
app.kubernetes.io/component: webhook
app.kubernetes.io/instance: spark-operator
app.kubernetes.io/name: spark-operator
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- labelSelector:
matchLabels:
app.kubernetes.io/component: webhook
app.kubernetes.io/instance: spark-operator
app.kubernetes.io/name: spark-operator
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
- it: Should fail if `webhook.topologySpreadConstraints` is set and `webhook.replicas` is not greater than 1
set:
webhook:
replicas: 1
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
asserts:
- failedTemplate:
errorMessage: "webhook.replicas must be greater than 1 to enable topology spread constraints for webhook pods"

View File

@ -1,99 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test mutating webhook configuration
templates:
- webhook/mutatingwebhookconfiguration.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should create the mutating webhook configuration by default
asserts:
- containsDocument:
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
name: spark-operator-webhook
- it: Should not create the mutating webhook configuration if `webhook.enable` is `false`
set:
webhook:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should use the specified webhook port
set:
webhook:
port: 12345
asserts:
- equal:
path: webhooks[*].clientConfig.service.port
value: 12345
- it: Should use the specified failure policy
set:
webhook:
failurePolicy: Fail
asserts:
- equal:
path: webhooks[*].failurePolicy
value: Fail
- it: Should set namespaceSelector if `spark.jobNamespaces` is set with non-empty strings
set:
spark:
jobNamespaces:
- ns1
- ns2
- ns3
asserts:
- equal:
path: webhooks[*].namespaceSelector
value:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- ns1
- ns2
- ns3
- it: Should not set namespaceSelector if `spark.jobNamespaces` contains empty string
set:
spark:
jobNamespaces:
- ""
- ns1
- ns2
- ns3
asserts:
- notExists:
path: webhooks[*].namespaceSelector
- it: Should should use the specified timeoutSeconds
set:
webhook:
timeoutSeconds: 5
asserts:
- equal:
path: webhooks[*].timeoutSeconds
value: 5

View File

@ -1,76 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test webhook pod disruption budget
templates:
- webhook/poddisruptionbudget.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not render podDisruptionBudget if `webhook.enable` is `false`
set:
webhook:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should not render podDisruptionBudget if `webhook.podDisruptionBudget.enable` is false
set:
webhook:
podDisruptionBudget:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should fail if `webhook.replicas` is less than 2 when `webhook.podDisruptionBudget.enable` is true
set:
webhook:
replicas: 1
podDisruptionBudget:
enable: true
asserts:
- failedTemplate:
errorMessage: "webhook.replicas must be greater than 1 to enable pod disruption budget for webhook"
- it: Should render spark operator podDisruptionBudget if `webhook.podDisruptionBudget.enable` is true
set:
webhook:
replicas: 2
podDisruptionBudget:
enable: true
asserts:
- containsDocument:
apiVersion: policy/v1
kind: PodDisruptionBudget
name: spark-operator-webhook-pdb
- it: Should set minAvailable if `webhook.podDisruptionBudget.minAvailable` is specified
set:
webhook:
replicas: 2
podDisruptionBudget:
enable: true
minAvailable: 3
asserts:
- equal:
path: spec.minAvailable
value: 3

View File

@ -1,165 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test webhook rbac
templates:
- webhook/rbac.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create webhook RBAC resources if `webhook.rbac.create` is false
set:
webhook:
rbac:
create: false
asserts:
- hasDocuments:
count: 0
- it: Should create webhook ClusterRole by default
documentIndex: 0
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
name: spark-operator-webhook
- it: Should create webhook ClusterRoleBinding by default
documentIndex: 1
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
name: spark-operator-webhook
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark-operator-webhook
namespace: spark-operator
count: 1
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: spark-operator-webhook
- it: Should add extra annotations to webhook ClusterRole if `webhook.rbac.annotations` is set
set:
webhook:
rbac:
annotations:
key1: value1
key2: value2
asserts:
- equal:
path: metadata.annotations.key1
value: value1
- equal:
path: metadata.annotations.key2
value: value2
- it: Should create role and rolebinding for webhook in release namespace
documentIndex: 2
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-webhook
namespace: spark-operator
- it: Should create role and rolebinding for webhook in release namespace
documentIndex: 3
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-webhook
namespace: spark-operator
- contains:
path: subjects
content:
kind: ServiceAccount
name: spark-operator-webhook
namespace: spark-operator
count: 1
- equal:
path: roleRef
value:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark-operator-webhook
- it: Should create roles and rolebindings for webhook in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 4
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-webhook
namespace: default
- it: Should create roles and rolebindings for webhook in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 5
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-webhook
namespace: default
- it: Should create roles and rolebindings for webhook in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 6
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
name: spark-operator-webhook
namespace: spark
- it: Should create roles and rolebindings for webhook in every spark job namespace if `spark.jobNamespaces` is set and does not contain empty string
set:
spark:
jobNamespaces:
- default
- spark
documentIndex: 7
asserts:
- containsDocument:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: spark-operator-webhook
namespace: spark

View File

@ -1,49 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test webhook service
templates:
- webhook/service.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should not create webhook service if `webhook.enable` is `false`
set:
webhook:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should create the webhook service correctly
set:
webhook:
portName: webhook
asserts:
- containsDocument:
apiVersion: v1
kind: Service
name: spark-operator-webhook-svc
- equal:
path: spec.ports[0]
value:
port: 9443
targetPort: webhook
name: webhook

View File

@ -1,97 +0,0 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
suite: Test validating webhook configuration
templates:
- webhook/validatingwebhookconfiguration.yaml
release:
name: spark-operator
namespace: spark-operator
tests:
- it: Should create the validating webhook configuration by default
asserts:
- containsDocument:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
name: spark-operator-webhook
- it: Should not create the validating webhook configuration if `webhook.enable` is `false`
set:
webhook:
enable: false
asserts:
- hasDocuments:
count: 0
- it: Should use the specified webhook port
set:
webhook:
port: 12345
asserts:
- equal:
path: webhooks[*].clientConfig.service.port
value: 12345
- it: Should use the specified failure policy
set:
webhook:
failurePolicy: Fail
asserts:
- equal:
path: webhooks[*].failurePolicy
value: Fail
- it: Should set namespaceSelector if `spark.jobNamespaces` is set with non-empty strings
set:
spark.jobNamespaces:
- ns1
- ns2
- ns3
asserts:
- equal:
path: webhooks[*].namespaceSelector
value:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- ns1
- ns2
- ns3
- it: Should not set namespaceSelector if `spark.jobNamespaces` contains empty string
set:
spark:
jobNamespaces:
- ""
- ns1
- ns2
- ns3
asserts:
- notExists:
path: webhooks[*].namespaceSelector
- it: Should should use the specified timeoutSeconds
set:
webhook:
timeoutSeconds: 5
asserts:
- equal:
path: webhooks[*].timeoutSeconds
value: 5

View File

@ -1,443 +1,168 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Default values for spark-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# -- String to partially override release name.
nameOverride: ""
# -- String to fully override release name.
fullnameOverride: ""
# -- Common labels to add to the resources.
commonLabels: {}
# replicaCount -- Desired number of pods, leaderElection will be enabled
# if this is greater than 1
replicaCount: 1
image:
# -- Image registry.
registry: ghcr.io
# -- Image repository.
repository: kubeflow/spark-operator/controller
# -- Image tag.
# @default -- If not set, the chart appVersion will be used.
tag: ""
# -- Image pull policy.
# -- Image repository
repository: gcr.io/spark-operator/spark-operator
# -- Image pull policy
pullPolicy: IfNotPresent
# -- Image pull secrets for private image registry.
pullSecrets: []
# - name: <secret-name>
# -- Overrides the image tag whose default is the chart appVersion.
tag: "latest"
controller:
# -- Number of replicas of controller.
replicas: 1
# -- Image pull secrets
imagePullSecrets: []
leaderElection:
# -- Specifies whether to enable leader election for controller.
enable: true
# -- String to partially override `spark-operator.fullname` template (will maintain the release name)
nameOverride: ""
# -- Reconcile concurrency, higher values might increase memory usage.
workers: 10
# -- String to override release name
fullnameOverride: ""
# -- Configure the verbosity of logging, can be one of `debug`, `info`, `error`.
logLevel: info
rbac:
# -- **DEPRECATED** use `createRole` and `createClusterRole`
create: false
# -- Create and use RBAC `Role` resources
createRole: true
# -- Create and use RBAC `ClusterRole` resources
createClusterRole: true
# -- Configure the encoder of logging, can be one of `console` or `json`.
logEncoder: console
# -- Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created.
driverPodCreationGracePeriod: 10s
# -- Specifies the maximum number of Executor pods that can be tracked by the controller per SparkApplication.
maxTrackedExecutorPerApp: 1000
uiService:
# -- Specifies whether to create service for Spark web UI.
enable: true
uiIngress:
# -- Specifies whether to create ingress for Spark web UI.
# `controller.uiService.enable` must be `true` to enable ingress.
enable: false
# -- Ingress URL format.
# Required if `controller.uiIngress.enable` is true.
urlFormat: ""
# -- Optionally set the ingressClassName.
ingressClassName: ""
# -- Optionally set default TLS configuration for the Spark UI's ingress. `ingressTLS` in the SparkApplication spec overrides this.
tls: []
# - hosts:
# - "*.example.com"
# secretName: "example-secret"
# -- Optionally set default ingress annotations for the Spark UI's ingress. `ingressAnnotations` in the SparkApplication spec overrides this.
annotations: {}
# key1: value1
# key2: value2
batchScheduler:
# -- Specifies whether to enable batch scheduler for spark jobs scheduling.
# If enabled, users can specify batch scheduler name in spark application.
enable: false
# -- Specifies a list of kube-scheduler names for scheduling Spark pods.
kubeSchedulerNames: []
# - default-scheduler
# -- Default batch scheduler to be used if not specified by the user.
# If specified, this value must be either "volcano" or "yunikorn". Specifying any other
# value will cause the controller to error on startup.
default: ""
serviceAccount:
# -- Specifies whether to create a service account for the controller.
serviceAccounts:
spark:
# -- Create a service account for spark apps
create: true
# -- Optional name for the controller service account.
# -- Optional name for the spark service account
name: ""
# -- Extra annotations for the controller service account.
# -- Optional annotations for the spark service account
annotations: {}
# -- Auto-mount service account token to the controller pods.
automountServiceAccountToken: true
rbac:
# -- Specifies whether to create RBAC resources for the controller.
sparkoperator:
# -- Create a service account for the operator
create: true
# -- Extra annotations for the controller RBAC resources.
# -- Optional name for the operator service account
name: ""
# -- Optional annotations for the operator service account
annotations: {}
# -- Extra labels for controller pods.
labels: {}
# key1: value1
# key2: value2
# -- Set this if running spark jobs in a different namespace than the operator
sparkJobNamespace: ""
# -- Extra annotations for controller pods.
annotations: {}
# key1: value1
# key2: value2
# -- Operator concurrency, higher values might increase memory usage
controllerThreads: 10
# -- Volumes for controller pods.
volumes:
# Create a tmp directory to write Spark artifacts to for deployed Spark apps.
- name: tmp
emptyDir:
sizeLimit: 1Gi
# -- Operator resync interval. Note that the operator will respond to events (e.g. create, update)
# unrelated to this setting
resyncInterval: 30
# -- Node selector for controller pods.
nodeSelector: {}
# -- Affinity for controller pods.
affinity: {}
# -- List of node taints to tolerate for controller pods.
tolerations: []
# -- Priority class for controller pods.
priorityClassName: ""
# -- Security context for controller pods.
podSecurityContext:
fsGroup: 185
# -- Topology spread constraints rely on node labels to identify the topology domain(s) that each Node is in.
# Ref: [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/).
# The labelSelector field in topology spread constraint will be set to the selector labels for controller pods if not specified.
topologySpreadConstraints: []
# - maxSkew: 1
# topologyKey: topology.kubernetes.io/zone
# whenUnsatisfiable: ScheduleAnyway
# - maxSkew: 1
# topologyKey: kubernetes.io/hostname
# whenUnsatisfiable: DoNotSchedule
# -- Environment variables for controller containers.
env: []
# -- Environment variable sources for controller containers.
envFrom: []
# -- Volume mounts for controller containers.
volumeMounts:
# Mount a tmp directory to write Spark artifacts to for deployed Spark apps.
- name: tmp
mountPath: "/tmp"
readOnly: false
# -- Pod resource requests and limits for controller containers.
# Note, that each job submission will spawn a JVM within the controller pods using "/usr/local/openjdk-11/bin/java -Xmx128m".
# Kubernetes may kill these Java processes at will to enforce resource limits. When that happens, you will see the following error:
# 'failed to run spark-submit for SparkApplication [...]: signal: killed' - when this happens, you may want to increase memory limits.
resources: {}
# limits:
# cpu: 100m
# memory: 300Mi
# requests:
# cpu: 100m
# memory: 300Mi
# -- Security context for controller containers.
securityContext:
readOnlyRootFilesystem: true
privileged: false
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
# -- Sidecar containers for controller pods.
sidecars: []
# Pod disruption budget for controller to avoid service degradation.
podDisruptionBudget:
# -- Specifies whether to create pod disruption budget for controller.
# Ref: [Specifying a Disruption Budget for your Application](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
enable: false
# -- The number of pods that must be available.
# Require `controller.replicas` to be greater than 1
minAvailable: 1
pprof:
# -- Specifies whether to enable pprof.
enable: false
# -- Specifies pprof port.
port: 6060
# -- Specifies pprof service port name.
portName: pprof
# Workqueue rate limiter configuration forwarded to the controller-runtime Reconciler.
workqueueRateLimiter:
# -- Specifies the average rate of items process by the workqueue rate limiter.
bucketQPS: 50
# -- Specifies the maximum number of items that can be in the workqueue at any given time.
bucketSize: 500
maxDelay:
# -- Specifies whether to enable max delay for the workqueue rate limiter.
# This is useful to avoid losing events when the workqueue is full.
enable: true
# -- Specifies the maximum delay duration for the workqueue rate limiter.
duration: 6h
webhook:
# -- Specifies whether to enable webhook.
uiService:
# -- Enable UI service creation for Spark application
enable: true
# -- Number of replicas of webhook server.
replicas: 1
# -- Ingress URL format.
# Requires the UI service to be enabled by setting `uiService.enable` to true.
ingressUrlFormat: ""
leaderElection:
# -- Specifies whether to enable leader election for webhook.
enable: true
# -- Set higher levels for more verbose logging
logLevel: 2
# -- Configure the verbosity of logging, can be one of `debug`, `info`, `error`.
logLevel: info
# podSecurityContext -- Pod security context
podSecurityContext: {}
# -- Configure the encoder of logging, can be one of `console` or `json`.
logEncoder: console
# securityContext -- Operator container security context
securityContext: {}
# -- Specifies webhook port.
port: 9443
# -- Specifies webhook service port name.
portName: webhook
# -- Specifies how unrecognized errors are handled.
# Available options are `Ignore` or `Fail`.
failurePolicy: Fail
# -- Specifies the timeout seconds of the webhook, the value must be between 1 and 30.
timeoutSeconds: 10
resourceQuotaEnforcement:
# -- Specifies whether to enable the ResourceQuota enforcement for SparkApplication resources.
enable: false
serviceAccount:
# -- Specifies whether to create a service account for the webhook.
create: true
# -- Optional name for the webhook service account.
name: ""
# -- Extra annotations for the webhook service account.
annotations: {}
# -- Auto-mount service account token to the webhook pods.
automountServiceAccountToken: true
rbac:
# -- Specifies whether to create RBAC resources for the webhook.
create: true
# -- Extra annotations for the webhook RBAC resources.
annotations: {}
# -- Extra labels for webhook pods.
labels: {}
# key1: value1
# key2: value2
# -- Extra annotations for webhook pods.
annotations: {}
# key1: value1
# key2: value2
# -- Sidecar containers for webhook pods.
sidecars: []
# -- Volumes for webhook pods.
volumes:
# Create a dir for the webhook to generate its certificates in.
- name: serving-certs
emptyDir:
sizeLimit: 500Mi
# -- Node selector for webhook pods.
nodeSelector: {}
# -- Affinity for webhook pods.
affinity: {}
# -- List of node taints to tolerate for webhook pods.
tolerations: []
# -- Priority class for webhook pods.
priorityClassName: ""
# -- Security context for webhook pods.
podSecurityContext:
fsGroup: 185
# -- Topology spread constraints rely on node labels to identify the topology domain(s) that each Node is in.
# Ref: [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/).
# The labelSelector field in topology spread constraint will be set to the selector labels for webhook pods if not specified.
topologySpreadConstraints: []
# - maxSkew: 1
# topologyKey: topology.kubernetes.io/zone
# whenUnsatisfiable: ScheduleAnyway
# - maxSkew: 1
# topologyKey: kubernetes.io/hostname
# whenUnsatisfiable: DoNotSchedule
# -- Environment variables for webhook containers.
env: []
# -- Environment variable sources for webhook containers.
envFrom: []
# -- Volume mounts for webhook containers.
volumeMounts:
# Mount a dir for the webhook to generate its certificates in.
- name: serving-certs
mountPath: /etc/k8s-webhook-server/serving-certs
subPath: serving-certs
readOnly: false
# -- Pod resource requests and limits for webhook pods.
resources: {}
# limits:
# cpu: 100m
# memory: 300Mi
# requests:
# cpu: 100m
# memory: 300Mi
# -- Security context for webhook containers.
securityContext:
readOnlyRootFilesystem: true
privileged: false
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
# Pod disruption budget for webhook to avoid service degradation.
podDisruptionBudget:
# -- Specifies whether to create pod disruption budget for webhook.
# Ref: [Specifying a Disruption Budget for your Application](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
enable: false
# -- The number of pods that must be available.
# Require `webhook.replicas` to be greater than 1
minAvailable: 1
spark:
# -- List of namespaces where to run spark jobs.
# If empty string is included, all namespaces will be allowed.
# Make sure the namespaces have already existed.
jobNamespaces:
- default
serviceAccount:
# -- Specifies whether to create a service account for spark applications.
create: true
# -- Optional name for the spark service account.
name: ""
# -- Optional annotations for the spark service account.
annotations: {}
# -- Auto-mount service account token to the spark applications pods.
automountServiceAccountToken: true
rbac:
# -- Specifies whether to create RBAC resources for spark applications.
create: true
# -- Optional annotations for the spark application RBAC resources.
annotations: {}
prometheus:
metrics:
# -- Specifies whether to enable prometheus metrics scraping.
enable: true
# -- Metrics port.
port: 8080
# -- Metrics port name.
portName: metrics
# -- Metrics serving endpoint.
endpoint: /metrics
# -- Metrics prefix, will be added to all exported metrics.
prefix: ""
# -- Job Start Latency histogram buckets. Specified in seconds.
jobStartLatencyBuckets: "30,60,90,120,150,180,210,240,270,300"
# Prometheus pod monitor for controller pods
podMonitor:
# -- Specifies whether to create pod monitor.
# Note that prometheus metrics should be enabled as well.
create: false
# -- Pod monitor labels
labels: {}
# -- The label to use to retrieve the job name from
jobLabel: spark-operator-podmonitor
# -- Prometheus metrics endpoint properties. `metrics.portName` will be used as a port
podMetricsEndpoint:
scheme: http
interval: 5s
certManager:
# -- Specifies whether to use [cert-manager](https://cert-manager.io) to generate certificate for webhook.
# `webhook.enable` must be set to `true` to enable cert-manager.
webhook:
# -- Enable webhook server
enable: false
# -- The reference to the issuer.
# @default -- A self-signed issuer will be created and used if not specified.
issuerRef: {}
# group: cert-manager.io
# kind: ClusterIssuer
# name: selfsigned
# -- The duration of the certificate validity (e.g. `2160h`).
# See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate).
# @default -- `2160h` (90 days) will be used if not specified.
duration: ""
# -- The duration before the certificate expiration to renew the certificate (e.g. `720h`).
# See [cert-manager.io/v1.Certificate](https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.Certificate).
# @default -- 1/3 of issued certificates lifetime.
renewBefore: ""
# -- Webhook service port
port: 8080
# -- The webhook server will only operate on namespaces with this label, specified in the form key1=value1,key2=value2.
# Empty string (default) will operate on all namespaces
namespaceSelector: ""
# -- The annotations applied to init job, required to restore certs deleted by the cleanup job during upgrade
initAnnotations:
"helm.sh/hook": pre-install, pre-upgrade
"helm.sh/hook-weight": "50"
# -- The annotations applied to the cleanup job, required for helm lifecycle hooks
cleanupAnnotations:
"helm.sh/hook": pre-delete, pre-upgrade
"helm.sh/hook-delete-policy": hook-succeeded
metrics:
# -- Enable prometheus metric scraping
enable: true
# -- Metrics port
port: 10254
# -- Metrics port name
portName: metrics
# -- Metrics serving endpoint
endpoint: /metrics
# -- Metric prefix, will be added to all exported metrics
prefix: ""
# -- Prometheus pod monitor for operator's pod.
podMonitor:
# -- If enabled, a pod monitor for operator's pod will be submitted. Note that prometheus metrics should be enabled as well.
enable: false
# -- Pod monitor labels
labels: {}
# -- The label to use to retrieve the job name from
jobLabel: spark-operator-podmonitor
# -- Prometheus metrics endpoint properties. `metrics.portName` will be used as a port
podMetricsEndpoint:
scheme: http
interval: 5s
# nodeSelector -- Node labels for pod assignment
nodeSelector: {}
# tolerations -- List of node taints to tolerate
tolerations: []
# affinity -- Affinity for pod assignment
affinity: {}
# podAnnotations -- Additional annotations to add to the pod
podAnnotations: {}
# podLabels -- Additional labels to add to the pod
podLabels: {}
# resources -- Pod resource requests and limits
# Note, that each job submission will spawn a JVM within the Spark Operator Pod using "/usr/local/openjdk-11/bin/java -Xmx128m".
# Kubernetes may kill these Java processes at will to enforce resource limits. When that happens, you will see the following error:
# 'failed to run spark-submit for SparkApplication [...]: signal: killed' - when this happens, you may want to increase memory limits.
resources: {}
# limits:
# cpu: 100m
# memory: 300Mi
# requests:
# cpu: 100m
# memory: 300Mi
batchScheduler:
# -- Enable batch scheduler for spark jobs scheduling. If enabled, users can specify batch scheduler name in spark application
enable: false
resourceQuotaEnforcement:
# -- Whether to enable the ResourceQuota enforcement for SparkApplication resources.
# Requires the webhook to be enabled by setting `webhook.enable` to true.
# Ref: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#enabling-resource-quota-enforcement.
enable: false
leaderElection:
# -- Leader election lock name.
# Ref: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#enabling-leader-election-for-high-availability.
lockName: "spark-operator-lock"
# -- Optionally store the lock in another namespace. Defaults to operator's namespace
lockNamespace: ""
istio:
# -- When using `istio`, spark jobs need to run without a sidecar to properly terminate
enabled: false
# labelSelectorFilter -- A comma-separated list of key=value, or key labels to filter resources during watch and list based on the specified labels.
labelSelectorFilter: ""

View File

@ -1,463 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package controller
import (
"crypto/tls"
"encoding/json"
"flag"
"fmt"
"os"
"slices"
"time"
// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"
"github.com/spf13/cobra"
"github.com/spf13/viper"
"go.uber.org/zap"
"go.uber.org/zap/zapcore"
"golang.org/x/time/rate"
corev1 "k8s.io/api/core/v1"
networkingv1 "k8s.io/api/networking/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/client-go/kubernetes"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
"k8s.io/utils/clock"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller"
"sigs.k8s.io/controller-runtime/pkg/healthz"
logzap "sigs.k8s.io/controller-runtime/pkg/log/zap"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
ctrlwebhook "sigs.k8s.io/controller-runtime/pkg/webhook"
schedulingv1alpha1 "sigs.k8s.io/scheduler-plugins/apis/scheduling/v1alpha1"
sparkoperator "github.com/kubeflow/spark-operator/v2"
"github.com/kubeflow/spark-operator/v2/api/v1alpha1"
"github.com/kubeflow/spark-operator/v2/api/v1beta2"
"github.com/kubeflow/spark-operator/v2/internal/controller/scheduledsparkapplication"
"github.com/kubeflow/spark-operator/v2/internal/controller/sparkapplication"
"github.com/kubeflow/spark-operator/v2/internal/controller/sparkconnect"
"github.com/kubeflow/spark-operator/v2/internal/metrics"
"github.com/kubeflow/spark-operator/v2/internal/scheduler"
"github.com/kubeflow/spark-operator/v2/internal/scheduler/kubescheduler"
"github.com/kubeflow/spark-operator/v2/internal/scheduler/volcano"
"github.com/kubeflow/spark-operator/v2/internal/scheduler/yunikorn"
"github.com/kubeflow/spark-operator/v2/pkg/common"
"github.com/kubeflow/spark-operator/v2/pkg/util"
// +kubebuilder:scaffold:imports
)
var (
scheme = runtime.NewScheme()
logger = ctrl.Log.WithName("")
)
var (
namespaces []string
// Controller
controllerThreads int
cacheSyncTimeout time.Duration
maxTrackedExecutorPerApp int
//WorkQueue
workqueueRateLimiterBucketQPS int
workqueueRateLimiterBucketSize int
workqueueRateLimiterMaxDelay time.Duration
// Batch scheduler
enableBatchScheduler bool
kubeSchedulerNames []string
defaultBatchScheduler string
// Spark web UI service and ingress
enableUIService bool
ingressClassName string
ingressURLFormat string
ingressTLS []networkingv1.IngressTLS
ingressAnnotations map[string]string
// Leader election
enableLeaderElection bool
leaderElectionLockName string
leaderElectionLockNamespace string
leaderElectionLeaseDuration time.Duration
leaderElectionRenewDeadline time.Duration
leaderElectionRetryPeriod time.Duration
driverPodCreationGracePeriod time.Duration
// Metrics
enableMetrics bool
metricsBindAddress string
metricsEndpoint string
metricsPrefix string
metricsLabels []string
metricsJobStartLatencyBuckets []float64
healthProbeBindAddress string
pprofBindAddress string
secureMetrics bool
enableHTTP2 bool
development bool
zapOptions = logzap.Options{}
)
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(schedulingv1alpha1.AddToScheme(scheme))
utilruntime.Must(v1alpha1.AddToScheme(scheme))
utilruntime.Must(v1beta2.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
func NewStartCommand() *cobra.Command {
var ingressTLSstring string
var ingressAnnotationsString string
var command = &cobra.Command{
Use: "start",
Short: "Start controller and webhook",
PreRunE: func(_ *cobra.Command, args []string) error {
development = viper.GetBool("development")
if ingressTLSstring != "" {
if err := json.Unmarshal([]byte(ingressTLSstring), &ingressTLS); err != nil {
return fmt.Errorf("failed parsing ingress-tls JSON string from CLI: %v", err)
}
}
if ingressAnnotationsString != "" {
if err := json.Unmarshal([]byte(ingressAnnotationsString), &ingressAnnotations); err != nil {
return fmt.Errorf("failed parsing ingress-annotations JSON string from CLI: %v", err)
}
}
return nil
},
Run: func(_ *cobra.Command, args []string) {
sparkoperator.PrintVersion(false)
start()
},
}
command.Flags().IntVar(&controllerThreads, "controller-threads", 10, "Number of worker threads used by the SparkApplication controller.")
command.Flags().StringSliceVar(&namespaces, "namespaces", []string{}, "The Kubernetes namespace to manage. Will manage custom resource objects of the managed CRD types for the whole cluster if unset or contains empty string.")
command.Flags().DurationVar(&cacheSyncTimeout, "cache-sync-timeout", 30*time.Second, "Informer cache sync timeout.")
command.Flags().IntVar(&maxTrackedExecutorPerApp, "max-tracked-executor-per-app", 1000, "The maximum number of tracked executors per SparkApplication.")
command.Flags().IntVar(&workqueueRateLimiterBucketQPS, "workqueue-ratelimiter-bucket-qps", 10, "QPS of the bucket rate of the workqueue.")
command.Flags().IntVar(&workqueueRateLimiterBucketSize, "workqueue-ratelimiter-bucket-size", 100, "The token bucket size of the workqueue.")
command.Flags().DurationVar(&workqueueRateLimiterMaxDelay, "workqueue-ratelimiter-max-delay", rate.InfDuration, "The maximum delay of the workqueue.")
command.Flags().BoolVar(&enableBatchScheduler, "enable-batch-scheduler", false, "Enable batch schedulers.")
command.Flags().StringSliceVar(&kubeSchedulerNames, "kube-scheduler-names", []string{}, "The kube-scheduler names for scheduling Spark applications.")
command.Flags().StringVar(&defaultBatchScheduler, "default-batch-scheduler", "", "Default batch scheduler.")
command.Flags().BoolVar(&enableUIService, "enable-ui-service", true, "Enable Spark Web UI service.")
command.Flags().StringVar(&ingressClassName, "ingress-class-name", "", "Set ingressClassName for ingress resources created.")
command.Flags().StringVar(&ingressURLFormat, "ingress-url-format", "", "Ingress URL format.")
command.Flags().StringVar(&ingressTLSstring, "ingress-tls", "", "JSON format string for the default TLS config on the Spark UI ingresses. e.g. '[{\"hosts\":[\"*.example.com\"],\"secretName\":\"example-secret\"}]'. `ingressTLS` in the SparkApplication spec will override this value.")
command.Flags().StringVar(&ingressAnnotationsString, "ingress-annotations", "", "JSON format string for the default ingress annotations for the Spark UI ingresses. e.g. '[{\"cert-manager.io/cluster-issuer\": \"letsencrypt\"}]'. `ingressAnnotations` in the SparkApplication spec will override this value.")
command.Flags().BoolVar(&enableLeaderElection, "leader-election", false, "Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
command.Flags().StringVar(&leaderElectionLockName, "leader-election-lock-name", "spark-operator-lock", "Name of the ConfigMap for leader election.")
command.Flags().StringVar(&leaderElectionLockNamespace, "leader-election-lock-namespace", "spark-operator", "Namespace in which to create the ConfigMap for leader election.")
command.Flags().DurationVar(&leaderElectionLeaseDuration, "leader-election-lease-duration", 15*time.Second, "Leader election lease duration.")
command.Flags().DurationVar(&leaderElectionRenewDeadline, "leader-election-renew-deadline", 14*time.Second, "Leader election renew deadline.")
command.Flags().DurationVar(&leaderElectionRetryPeriod, "leader-election-retry-period", 4*time.Second, "Leader election retry period.")
command.Flags().DurationVar(&driverPodCreationGracePeriod, "driver-pod-creation-grace-period", 10*time.Second, "Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created.")
command.Flags().BoolVar(&enableMetrics, "enable-metrics", false, "Enable metrics.")
command.Flags().StringVar(&metricsBindAddress, "metrics-bind-address", "0", "The address the metric endpoint binds to. "+
"Use the port :8080. If not set, it will be 0 in order to disable the metrics server")
command.Flags().StringVar(&metricsEndpoint, "metrics-endpoint", "/metrics", "Metrics endpoint.")
command.Flags().StringVar(&metricsPrefix, "metrics-prefix", "", "Prefix for the metrics.")
command.Flags().StringSliceVar(&metricsLabels, "metrics-labels", []string{}, "Labels to be added to the metrics.")
command.Flags().Float64SliceVar(&metricsJobStartLatencyBuckets, "metrics-job-start-latency-buckets", []float64{30, 60, 90, 120, 150, 180, 210, 240, 270, 300}, "Buckets for the job start latency histogram.")
command.Flags().StringVar(&healthProbeBindAddress, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
command.Flags().BoolVar(&secureMetrics, "secure-metrics", false, "If set the metrics endpoint is served securely")
command.Flags().BoolVar(&enableHTTP2, "enable-http2", false, "If set, HTTP/2 will be enabled for the metrics and webhook servers")
command.Flags().StringVar(&pprofBindAddress, "pprof-bind-address", "0", "The address the pprof endpoint binds to. "+
"If not set, it will be 0 in order to disable the pprof server")
flagSet := flag.NewFlagSet("controller", flag.ExitOnError)
ctrl.RegisterFlags(flagSet)
zapOptions.BindFlags(flagSet)
command.Flags().AddGoFlagSet(flagSet)
return command
}
func start() {
setupLog()
// Create the client rest config. Use kubeConfig if given, otherwise assume in-cluster.
cfg, err := ctrl.GetConfig()
if err != nil {
logger.Error(err, "failed to get kube config")
os.Exit(1)
}
// Create the manager.
tlsOptions := newTLSOptions()
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: scheme,
Cache: newCacheOptions(),
Metrics: metricsserver.Options{
BindAddress: metricsBindAddress,
SecureServing: secureMetrics,
TLSOpts: tlsOptions,
},
WebhookServer: ctrlwebhook.NewServer(ctrlwebhook.Options{
TLSOpts: tlsOptions,
}),
HealthProbeBindAddress: healthProbeBindAddress,
PprofBindAddress: pprofBindAddress,
LeaderElection: enableLeaderElection,
LeaderElectionID: leaderElectionLockName,
LeaderElectionNamespace: leaderElectionLockNamespace,
// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.
//
// In the default scaffold provided, the program ends immediately after
// the manager stops, so would be fine to enable this option. However,
// if you are doing or is intended to do any operation such as perform cleanups
// after the manager stops then its usage might be unsafe.
// LeaderElectionReleaseOnCancel: true,
})
if err != nil {
logger.Error(err, "failed to create manager")
os.Exit(1)
}
clientset, err := kubernetes.NewForConfig(cfg)
if err != nil {
logger.Error(err, "failed to create clientset")
os.Exit(1)
}
if err = util.InitializeIngressCapabilities(clientset); err != nil {
logger.Error(err, "failed to retrieve cluster ingress capabilities")
os.Exit(1)
}
var registry *scheduler.Registry
if enableBatchScheduler {
registry = scheduler.GetRegistry()
_ = registry.Register(common.VolcanoSchedulerName, volcano.Factory)
_ = registry.Register(yunikorn.SchedulerName, yunikorn.Factory)
// Register kube-schedulers.
for _, name := range kubeSchedulerNames {
_ = registry.Register(name, kubescheduler.Factory)
}
schedulerNames := registry.GetRegisteredSchedulerNames()
if defaultBatchScheduler != "" && !slices.Contains(schedulerNames, defaultBatchScheduler) {
logger.Error(nil, "Failed to find default batch scheduler in registered schedulers")
os.Exit(1)
}
}
sparkSubmitter := &sparkapplication.SparkSubmitter{}
// Setup controller for SparkApplication.
if err = sparkapplication.NewReconciler(
mgr,
mgr.GetScheme(),
mgr.GetClient(),
mgr.GetEventRecorderFor("spark-application-controller"),
registry,
sparkSubmitter,
newSparkApplicationReconcilerOptions(),
).SetupWithManager(mgr, newControllerOptions()); err != nil {
logger.Error(err, "Failed to create controller", "controller", "SparkApplication")
os.Exit(1)
}
// Setup controller for ScheduledSparkApplication.
if err = scheduledsparkapplication.NewReconciler(
mgr.GetScheme(),
mgr.GetClient(),
mgr.GetEventRecorderFor("scheduled-spark-application-controller"),
clock.RealClock{},
newScheduledSparkApplicationReconcilerOptions(),
).SetupWithManager(mgr, newControllerOptions()); err != nil {
logger.Error(err, "Failed to create controller", "controller", "ScheduledSparkApplication")
os.Exit(1)
}
// Setup controller for SparkConnect.
if err = sparkconnect.NewReconciler(
mgr,
mgr.GetScheme(),
mgr.GetClient(),
mgr.GetEventRecorderFor("SparkConnect"),
newSparkConnectReconcilerOptions(),
).SetupWithManager(mgr, newControllerOptions()); err != nil {
logger.Error(err, "Failed to create controller", "controller", "SparkConnect")
os.Exit(1)
}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
logger.Error(err, "Failed to set up health check")
os.Exit(1)
}
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
logger.Error(err, "Failed to set up ready check")
os.Exit(1)
}
logger.Info("Starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
logger.Error(err, "Failed to start manager")
os.Exit(1)
}
}
// setupLog Configures the logging system
func setupLog() {
ctrl.SetLogger(logzap.New(
logzap.UseFlagOptions(&zapOptions),
func(o *logzap.Options) {
o.Development = development
o.ZapOpts = append(o.ZapOpts, zap.AddCaller())
o.EncoderConfigOptions = append(o.EncoderConfigOptions, func(config *zapcore.EncoderConfig) {
config.EncodeLevel = zapcore.CapitalLevelEncoder
config.EncodeTime = zapcore.ISO8601TimeEncoder
config.EncodeCaller = zapcore.ShortCallerEncoder
})
}),
)
}
func newTLSOptions() []func(c *tls.Config) {
// if the enable-http2 flag is false (the default), http/2 should be disabled
// due to its vulnerabilities. More specifically, disabling http/2 will
// prevent from being vulnerable to the HTTP/2 Stream Cancellation and
// Rapid Reset CVEs. For more information see:
// - https://github.com/advisories/GHSA-qppj-fm5r-hxr3
// - https://github.com/advisories/GHSA-4374-p667-p6c8
disableHTTP2 := func(c *tls.Config) {
logger.Info("disabling http/2")
c.NextProtos = []string{"http/1.1"}
}
tlsOpts := []func(*tls.Config){}
if !enableHTTP2 {
tlsOpts = append(tlsOpts, disableHTTP2)
}
return tlsOpts
}
// newCacheOptions creates and returns a cache.Options instance configured with default namespaces and object caching settings.
func newCacheOptions() cache.Options {
defaultNamespaces := make(map[string]cache.Config)
if !util.ContainsString(namespaces, cache.AllNamespaces) {
for _, ns := range namespaces {
defaultNamespaces[ns] = cache.Config{}
}
}
options := cache.Options{
Scheme: scheme,
DefaultNamespaces: defaultNamespaces,
ByObject: map[client.Object]cache.ByObject{
&corev1.Pod{}: {
Label: labels.SelectorFromSet(labels.Set{
common.LabelLaunchedBySparkOperator: "true",
}),
},
&corev1.ConfigMap{}: {},
&corev1.PersistentVolumeClaim{}: {},
&corev1.Service{}: {},
&v1beta2.SparkApplication{}: {},
&v1beta2.ScheduledSparkApplication{}: {},
&v1alpha1.SparkConnect{}: {},
},
}
return options
}
// newControllerOptions creates and returns a controller.Options instance configured with the given options.
func newControllerOptions() controller.Options {
options := controller.Options{
MaxConcurrentReconciles: controllerThreads,
CacheSyncTimeout: cacheSyncTimeout,
RateLimiter: util.NewRateLimiter[ctrl.Request](workqueueRateLimiterBucketQPS, workqueueRateLimiterBucketSize, workqueueRateLimiterMaxDelay),
}
return options
}
func newSparkApplicationReconcilerOptions() sparkapplication.Options {
var sparkApplicationMetrics *metrics.SparkApplicationMetrics
var sparkExecutorMetrics *metrics.SparkExecutorMetrics
if enableMetrics {
sparkApplicationMetrics = metrics.NewSparkApplicationMetrics(metricsPrefix, metricsLabels, metricsJobStartLatencyBuckets)
sparkApplicationMetrics.Register()
sparkExecutorMetrics = metrics.NewSparkExecutorMetrics(metricsPrefix, metricsLabels)
sparkExecutorMetrics.Register()
}
options := sparkapplication.Options{
Namespaces: namespaces,
EnableUIService: enableUIService,
IngressClassName: ingressClassName,
IngressURLFormat: ingressURLFormat,
IngressTLS: ingressTLS,
IngressAnnotations: ingressAnnotations,
DefaultBatchScheduler: defaultBatchScheduler,
DriverPodCreationGracePeriod: driverPodCreationGracePeriod,
SparkApplicationMetrics: sparkApplicationMetrics,
SparkExecutorMetrics: sparkExecutorMetrics,
MaxTrackedExecutorPerApp: maxTrackedExecutorPerApp,
}
if enableBatchScheduler {
options.KubeSchedulerNames = kubeSchedulerNames
}
return options
}
func newScheduledSparkApplicationReconcilerOptions() scheduledsparkapplication.Options {
options := scheduledsparkapplication.Options{
Namespaces: namespaces,
}
return options
}
func newSparkConnectReconcilerOptions() sparkconnect.Options {
options := sparkconnect.Options{
Namespaces: namespaces,
}
return options
}

View File

@ -1,49 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
"fmt"
"os"
"github.com/spf13/cobra"
"github.com/kubeflow/spark-operator/v2/cmd/operator/controller"
"github.com/kubeflow/spark-operator/v2/cmd/operator/version"
"github.com/kubeflow/spark-operator/v2/cmd/operator/webhook"
)
func NewCommand() *cobra.Command {
command := &cobra.Command{
Use: "spark-operator",
Short: "Spark operator",
RunE: func(cmd *cobra.Command, _ []string) error {
return cmd.Help()
},
}
command.AddCommand(controller.NewCommand())
command.AddCommand(webhook.NewCommand())
command.AddCommand(version.NewCommand())
return command
}
func main() {
if err := NewCommand().Execute(); err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}
}

View File

@ -1,40 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package version
import (
"github.com/spf13/cobra"
sparkoperator "github.com/kubeflow/spark-operator/v2"
)
var (
short bool
)
func NewCommand() *cobra.Command {
command := &cobra.Command{
Use: "version",
Short: "Print version information",
RunE: func(cmd *cobra.Command, args []string) error {
sparkoperator.PrintVersion(short)
return nil
},
}
command.Flags().BoolVar(&short, "short", false, "Print just the version string.")
return command
}

View File

@ -1,420 +0,0 @@
/*
Copyright 2024 The Kubeflow authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package webhook
import (
"context"
"crypto/tls"
"flag"
"os"
"time"
// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"
"github.com/spf13/cobra"
"github.com/spf13/viper"
"go.uber.org/zap"
"go.uber.org/zap/zapcore"
admissionregistrationv1 "k8s.io/api/admissionregistration/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/fields"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/apimachinery/pkg/util/wait"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller"
logzap "sigs.k8s.io/controller-runtime/pkg/log/zap"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
ctrlwebhook "sigs.k8s.io/controller-runtime/pkg/webhook"
sparkoperator "github.com/kubeflow/spark-operator/v2"
"github.com/kubeflow/spark-operator/v2/api/v1beta2"
"github.com/kubeflow/spark-operator/v2/internal/controller/mutatingwebhookconfiguration"
"github.com/kubeflow/spark-operator/v2/internal/controller/validatingwebhookconfiguration"
"github.com/kubeflow/spark-operator/v2/internal/webhook"
"github.com/kubeflow/spark-operator/v2/pkg/certificate"
"github.com/kubeflow/spark-operator/v2/pkg/common"
"github.com/kubeflow/spark-operator/v2/pkg/util"
// +kubebuilder:scaffold:imports
)
var (
scheme = runtime.NewScheme()
logger = ctrl.Log.WithName("")
)
var (
namespaces []string
labelSelectorFilter string
// Controller
controllerThreads int
cacheSyncTimeout time.Duration
// Webhook
enableResourceQuotaEnforcement bool
webhookCertDir string
webhookCertName string
webhookKeyName string
mutatingWebhookName string
validatingWebhookName string
webhookPort int
webhookSecretName string
webhookSecretNamespace string
webhookServiceName string
webhookServiceNamespace string
// Cert Manager
enableCertManager bool
// Leader election
enableLeaderElection bool
leaderElectionLockName string
leaderElectionLockNamespace string
leaderElectionLeaseDuration time.Duration
leaderElectionRenewDeadline time.Duration
leaderElectionRetryPeriod time.Duration
// Metrics
enableMetrics bool
metricsBindAddress string
metricsEndpoint string
metricsPrefix string
metricsLabels []string
healthProbeBindAddress string
secureMetrics bool
enableHTTP2 bool
development bool
zapOptions = logzap.Options{}
)
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(v1beta2.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
func NewStartCommand() *cobra.Command {
var command = &cobra.Command{
Use: "start",
Short: "Start controller and webhook",
PreRun: func(_ *cobra.Command, args []string) {
development = viper.GetBool("development")
},
Run: func(cmd *cobra.Command, args []string) {
sparkoperator.PrintVersion(false)
start()
},
}
// Controller
command.Flags().IntVar(&controllerThreads, "controller-threads", 10, "Number of worker threads used by the SparkApplication controller.")
command.Flags().StringSliceVar(&namespaces, "namespaces", []string{}, "The Kubernetes namespace to manage. Will manage custom resource objects of the managed CRD types for the whole cluster if unset or contains empty string.")
command.Flags().StringVar(&labelSelectorFilter, "label-selector-filter", "", "A comma-separated list of key=value, or key labels to filter resources during watch and list based on the specified labels.")
command.Flags().DurationVar(&cacheSyncTimeout, "cache-sync-timeout", 30*time.Second, "Informer cache sync timeout.")
// Webhook
command.Flags().StringVar(&webhookCertDir, "webhook-cert-dir", "/etc/k8s-webhook-server/serving-certs", "The directory that contains the webhook server key and certificate. "+
"When running as nonRoot, you must create and own this directory before running this command.")
command.Flags().StringVar(&webhookCertName, "webhook-cert-name", "tls.crt", "The file name of webhook server certificate.")
command.Flags().StringVar(&webhookKeyName, "webhook-key-name", "tls.key", "The file name of webhook server key.")
command.Flags().StringVar(&mutatingWebhookName, "mutating-webhook-name", "spark-operator-webhook", "The name of the mutating webhook.")
command.Flags().StringVar(&validatingWebhookName, "validating-webhook-name", "spark-operator-webhook", "The name of the validating webhook.")
command.Flags().IntVar(&webhookPort, "webhook-port", 9443, "Service port of the webhook server.")
command.Flags().StringVar(&webhookSecretName, "webhook-secret-name", "spark-operator-webhook-certs", "The name of the secret that contains the webhook server's TLS certificate and key.")
command.Flags().StringVar(&webhookSecretNamespace, "webhook-secret-namespace", "spark-operator", "The namespace of the secret that contains the webhook server's TLS certificate and key.")
command.Flags().StringVar(&webhookServiceName, "webhook-svc-name", "spark-webhook", "The name of the Service for the webhook server.")
command.Flags().StringVar(&webhookServiceNamespace, "webhook-svc-namespace", "spark-webhook", "The name of the Service for the webhook server.")
command.Flags().BoolVar(&enableResourceQuotaEnforcement, "enable-resource-quota-enforcement", false, "Whether to enable ResourceQuota enforcement for SparkApplication resources. Requires the webhook to be enabled.")
// Cert Manager
command.Flags().BoolVar(&enableCertManager, "enable-cert-manager", false, "Enable cert-manager to manage the webhook server's TLS certificate.")
// Leader election
command.Flags().BoolVar(&enableLeaderElection, "leader-election", false, "Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
command.Flags().StringVar(&leaderElectionLockName, "leader-election-lock-name", "spark-operator-lock", "Name of the ConfigMap for leader election.")
command.Flags().StringVar(&leaderElectionLockNamespace, "leader-election-lock-namespace", "spark-operator", "Namespace in which to create the ConfigMap for leader election.")
command.Flags().DurationVar(&leaderElectionLeaseDuration, "leader-election-lease-duration", 15*time.Second, "Leader election lease duration.")
command.Flags().DurationVar(&leaderElectionRenewDeadline, "leader-election-renew-deadline", 14*time.Second, "Leader election renew deadline.")
command.Flags().DurationVar(&leaderElectionRetryPeriod, "leader-election-retry-period", 4*time.Second, "Leader election retry period.")
// Prometheus metrics
command.Flags().BoolVar(&enableMetrics, "enable-metrics", false, "Enable metrics.")
command.Flags().StringVar(&metricsBindAddress, "metrics-bind-address", "0", "The address the metric endpoint binds to. "+
"Use the port :8080. If not set, it will be 0 in order to disable the metrics server")
command.Flags().StringVar(&metricsEndpoint, "metrics-endpoint", "/metrics", "Metrics endpoint.")
command.Flags().StringVar(&metricsPrefix, "metrics-prefix", "", "Prefix for the metrics.")
command.Flags().StringSliceVar(&metricsLabels, "metrics-labels", []string{}, "Labels to be added to the metrics.")
command.Flags().StringVar(&healthProbeBindAddress, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
command.Flags().BoolVar(&secureMetrics, "secure-metrics", false, "If set the metrics endpoint is served securely")
command.Flags().BoolVar(&enableHTTP2, "enable-http2", false, "If set, HTTP/2 will be enabled for the metrics and webhook servers")
flagSet := flag.NewFlagSet("controller", flag.ExitOnError)
ctrl.RegisterFlags(flagSet)
zapOptions.BindFlags(flagSet)
command.Flags().AddGoFlagSet(flagSet)
return command
}
func start() {
setupLog()
// Create the client rest config. Use kubeConfig if given, otherwise assume in-cluster.
cfg, err := ctrl.GetConfig()
if err != nil {
logger.Error(err, "failed to get kube config")
os.Exit(1)
}
// Create the manager.
tlsOptions := newTLSOptions()
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: scheme,
Cache: newCacheOptions(),
Metrics: metricsserver.Options{
BindAddress: metricsBindAddress,
SecureServing: secureMetrics,
TLSOpts: tlsOptions,
},
WebhookServer: ctrlwebhook.NewServer(ctrlwebhook.Options{
Port: webhookPort,
CertDir: webhookCertDir,
CertName: webhookCertName,
KeyName: webhookKeyName,
TLSOpts: tlsOptions,
}),
HealthProbeBindAddress: healthProbeBindAddress,
LeaderElection: enableLeaderElection,
LeaderElectionID: leaderElectionLockName,
LeaderElectionNamespace: leaderElectionLockNamespace,
// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.
//
// In the default scaffold provided, the program ends immediately after
// the manager stops, so would be fine to enable this option. However,
// if you are doing or is intended to do any operation such as perform cleanups
// after the manager stops then its usage might be unsafe.
// LeaderElectionReleaseOnCancel: true,
})
if err != nil {
logger.Error(err, "Failed to create manager")
os.Exit(1)
}
client, err := client.New(cfg, client.Options{Scheme: mgr.GetScheme()})
if err != nil {
logger.Error(err, "Failed to create client")
os.Exit(1)
}
certProvider := certificate.NewProvider(
client,
webhookServiceName,
webhookServiceNamespace,
enableCertManager,
)
if err := wait.ExponentialBackoff(
wait.Backoff{
Steps: 5,
Duration: 1 * time.Second,
Factor: 2.0,
Jitter: 0.1,
},
func() (bool, error) {
if err := certProvider.SyncSecret(context.TODO(), webhookSecretName, webhookSecretNamespace); err != nil {
if errors.IsAlreadyExists(err) || errors.IsConflict(err) {
return false, nil
}
return false, err
}
return true, nil
},
); err != nil {
logger.Error(err, "Failed to sync webhook secret")
os.Exit(1)
}
logger.Info("Writing certificates", "path", webhookCertDir, "certificate name", webhookCertName, "key name", webhookKeyName)
if err := certProvider.WriteFile(webhookCertDir, webhookCertName, webhookKeyName); err != nil {
logger.Error(err, "Failed to save certificate")
os.Exit(1)
}
if !enableCertManager {
if err := mutatingwebhookconfiguration.NewReconciler(
mgr.GetClient(),
certProvider,
mutatingWebhookName,
).SetupWithManager(mgr, controller.Options{}); err != nil {
logger.Error(err, "Failed to create controller", "controller", "MutatingWebhookConfiguration")
os.Exit(1)
}
if err := validatingwebhookconfiguration.NewReconciler(
mgr.GetClient(),
certProvider,
validatingWebhookName,
).SetupWithManager(mgr, controller.Options{}); err != nil {
logger.Error(err, "Failed to create controller", "controller", "ValidatingWebhookConfiguration")
os.Exit(1)
}
}
if err := ctrl.NewWebhookManagedBy(mgr).
For(&v1beta2.SparkApplication{}).
WithDefaulter(webhook.NewSparkApplicationDefaulter()).
WithValidator(webhook.NewSparkApplicationValidator(mgr.GetClient(), enableResourceQuotaEnforcement)).
Complete(); err != nil {
logger.Error(err, "Failed to create mutating webhook for Spark application")
os.Exit(1)
}
if err := ctrl.NewWebhookManagedBy(mgr).
For(&v1beta2.ScheduledSparkApplication{}).
WithDefaulter(webhook.NewScheduledSparkApplicationDefaulter()).
WithValidator(webhook.NewScheduledSparkApplicationValidator()).
Complete(); err != nil {
logger.Error(err, "Failed to create mutating webhook for Scheduled Spark application")
os.Exit(1)
}
if err := ctrl.NewWebhookManagedBy(mgr).
For(&corev1.Pod{}).
WithDefaulter(webhook.NewSparkPodDefaulter(mgr.GetClient(), namespaces)).
Complete(); err != nil {
logger.Error(err, "Failed to create mutating webhook for Spark pod")
os.Exit(1)
}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", mgr.GetWebhookServer().StartedChecker()); err != nil {
logger.Error(err, "Failed to set up health check")
os.Exit(1)
}
if err := mgr.AddReadyzCheck("readyz", mgr.GetWebhookServer().StartedChecker()); err != nil {
logger.Error(err, "Failed to set up ready check")
os.Exit(1)
}
logger.Info("Starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
logger.Error(err, "Failed to start manager")
os.Exit(1)
}
}
// setupLog Configures the logging system
func setupLog() {
ctrl.SetLogger(logzap.New(
logzap.UseFlagOptions(&zapOptions),
func(o *logzap.Options) {
o.Development = development
}, func(o *logzap.Options) {
o.ZapOpts = append(o.ZapOpts, zap.AddCaller())
}, func(o *logzap.Options) {
var config zapcore.EncoderConfig
if !development {
config = zap.NewProductionEncoderConfig()
} else {
config = zap.NewDevelopmentEncoderConfig()
}
config.EncodeLevel = zapcore.CapitalColorLevelEncoder
config.EncodeTime = zapcore.ISO8601TimeEncoder
config.EncodeCaller = zapcore.ShortCallerEncoder
o.Encoder = zapcore.NewConsoleEncoder(config)
}),
)
}
func newTLSOptions() []func(c *tls.Config) {
// if the enable-http2 flag is false (the default), http/2 should be disabled
// due to its vulnerabilities. More specifically, disabling http/2 will
// prevent from being vulnerable to the HTTP/2 Stream Cancellation and
// Rapid Reset CVEs. For more information see:
// - https://github.com/advisories/GHSA-qppj-fm5r-hxr3
// - https://github.com/advisories/GHSA-4374-p667-p6c8
disableHTTP2 := func(c *tls.Config) {
logger.Info("disabling http/2")
c.NextProtos = []string{"http/1.1"}
}
tlsOpts := []func(*tls.Config){}
if !enableHTTP2 {
tlsOpts = append(tlsOpts, disableHTTP2)
}
return tlsOpts
}
// newCacheOptions creates and returns a cache.Options instance configured with default namespaces and object caching settings.
func newCacheOptions() cache.Options {
defaultNamespaces := make(map[string]cache.Config)
if !util.ContainsString(namespaces, cache.AllNamespaces) {
for _, ns := range namespaces {
defaultNamespaces[ns] = cache.Config{}
}
}
byObject := map[client.Object]cache.ByObject{
&corev1.Pod{}: {
Label: labels.SelectorFromSet(labels.Set{
common.LabelLaunchedBySparkOperator: "true",
}),
},
&v1beta2.SparkApplication{}: {},
&v1beta2.ScheduledSparkApplication{}: {},
&admissionregistrationv1.MutatingWebhookConfiguration{}: {
Field: fields.SelectorFromSet(fields.Set{
"metadata.name": mutatingWebhookName,
}),
},
&admissionregistrationv1.ValidatingWebhookConfiguration{}: {
Field: fields.SelectorFromSet(fields.Set{
"metadata.name": validatingWebhookName,
}),
},
}
if enableResourceQuotaEnforcement {
byObject[&corev1.ResourceQuota{}] = cache.ByObject{}
}
options := cache.Options{
Scheme: scheme,
DefaultNamespaces: defaultNamespaces,
ByObject: byObject,
}
return options
}

View File

@ -1,35 +0,0 @@
# The following manifests contain a self-signed issuer CR and a certificate CR.
# More document can be found at https://docs.cert-manager.io
# WARNING: Targets CertManager v1.0. Check https://cert-manager.io/docs/installation/upgrading/ for breaking changes.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
labels:
app.kubernetes.io/name: spark-operator
app.kubernetes.io/managed-by: kustomize
name: selfsigned-issuer
namespace: system
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
labels:
app.kubernetes.io/name: certificate
app.kubernetes.io/instance: serving-cert
app.kubernetes.io/component: certificate
app.kubernetes.io/created-by: spark-operator
app.kubernetes.io/part-of: spark-operator
app.kubernetes.io/managed-by: kustomize
name: serving-cert # this name should match the one appeared in kustomizeconfig.yaml
namespace: system
spec:
# SERVICE_NAME and SERVICE_NAMESPACE will be substituted by kustomize
dnsNames:
- SERVICE_NAME.SERVICE_NAMESPACE.svc
- SERVICE_NAME.SERVICE_NAMESPACE.svc.cluster.local
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: webhook-server-cert # this secret will not be prefixed, since it's not managed by kustomize

View File

@ -1,5 +0,0 @@
resources:
- certificate.yaml
configurations:
- kustomizeconfig.yaml

Some files were not shown because too many files have changed in this diff Show More