mirror of https://github.com/kubeflow/katib.git
Compare commits
4 Commits
Author | SHA1 | Date |
---|---|---|
|
0a7453d212 | |
|
12a4896ae0 | |
|
8dcc7d3398 | |
|
73177dc229 |
|
@ -4,3 +4,5 @@ docs
|
|||
manifests
|
||||
pkg/ui/*/frontend/node_modules
|
||||
pkg/ui/*/frontend/build
|
||||
pkg/new-ui/*/frontend/node_modules
|
||||
pkg/new-ui/*/frontend/build
|
||||
|
|
4
.flake8
4
.flake8
|
@ -1,4 +0,0 @@
|
|||
[flake8]
|
||||
max-line-length = 100
|
||||
# E203 is ignored to avoid conflicts with Black's formatting, as it's not PEP 8 compliant
|
||||
extend-ignore = W503, E203
|
|
@ -0,0 +1,26 @@
|
|||
---
|
||||
name: Bug report
|
||||
about: Tell us about a problem you are experiencing
|
||||
---
|
||||
|
||||
/kind bug
|
||||
|
||||
**What steps did you take and what happened:**
|
||||
[A clear and concise description of what the bug is.]
|
||||
|
||||
**What did you expect to happen:**
|
||||
|
||||
**Anything else you would like to add:**
|
||||
[Miscellaneous information that will assist in solving the issue.]
|
||||
|
||||
**Environment:**
|
||||
|
||||
- Katib version (check the Katib controller image version):
|
||||
- Kubernetes version: (`kubectl version`):
|
||||
- OS (`uname -a`):
|
||||
|
||||
---
|
||||
|
||||
<!-- Don't delete this message to encourage users to support your issue! -->
|
||||
|
||||
Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍
|
|
@ -1,50 +0,0 @@
|
|||
name: Bug Report
|
||||
description: Tell us about a problem you are experiencing with Katib
|
||||
labels: ["kind/bug", "lifecycle/needs-triage"]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for taking the time to fill out this Katib bug report!
|
||||
- type: textarea
|
||||
id: problem
|
||||
attributes:
|
||||
label: What happened?
|
||||
description: |
|
||||
Please provide as much info as possible. Not doing so may result in your bug not being
|
||||
addressed in a timely manner.
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: expected
|
||||
attributes:
|
||||
label: What did you expect to happen?
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: environment
|
||||
attributes:
|
||||
label: Environment
|
||||
value: |
|
||||
Kubernetes version:
|
||||
```bash
|
||||
$ kubectl version
|
||||
|
||||
```
|
||||
Katib controller version:
|
||||
```bash
|
||||
$ kubectl get pods -n kubeflow -l katib.kubeflow.org/component=controller -o jsonpath="{.items[*].spec.containers[*].image}"
|
||||
|
||||
```
|
||||
Katib Python SDK version:
|
||||
```bash
|
||||
$ pip show kubeflow-katib
|
||||
|
||||
```
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
id: votes
|
||||
attributes:
|
||||
label: Impacted by this bug?
|
||||
value: Give it a 👍 We prioritize the issues with most 👍
|
|
@ -1,12 +1,9 @@
|
|||
blank_issues_enabled: true
|
||||
blank_issues_enabled: false
|
||||
|
||||
contact_links:
|
||||
- name: Katib Documentation
|
||||
url: https://www.kubeflow.org/docs/components/katib/
|
||||
about: Much help can be found in the docs
|
||||
- name: Kubeflow Katib Slack Channel
|
||||
url: https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels
|
||||
about: Ask the Katib community on CNCF Slack
|
||||
- name: Kubeflow Katib Community Meeting
|
||||
url: https://bit.ly/2PWVCkV
|
||||
about: Join the Kubeflow AutoML working group meeting
|
||||
- name: AutoML Slack Channel
|
||||
url: https://kubeflow.slack.com/archives/C018PMV53NW
|
||||
about: Ask the Katib community on Slack
|
||||
|
|
|
@ -0,0 +1,18 @@
|
|||
---
|
||||
name: Feature enhancement request
|
||||
about: Suggest an idea for this project
|
||||
---
|
||||
|
||||
/kind feature
|
||||
|
||||
**Describe the solution you'd like**
|
||||
[A clear and concise description of what you want to happen.]
|
||||
|
||||
**Anything else you would like to add:**
|
||||
[Miscellaneous information that will assist in solving the issue.]
|
||||
|
||||
---
|
||||
|
||||
<!-- Don't delete this message to encourage users to support your issue! -->
|
||||
|
||||
Love this feature? Give it a 👍 We prioritize the features with the most 👍
|
|
@ -1,28 +0,0 @@
|
|||
name: Feature Request
|
||||
description: Suggest an idea for Katib
|
||||
labels: ["kind/feature", "lifecycle/needs-triage"]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for taking the time to fill out this Katib feature request!
|
||||
- type: textarea
|
||||
id: feature
|
||||
attributes:
|
||||
label: What you would like to be added?
|
||||
description: |
|
||||
A clear and concise description of what you want to add to Katib.
|
||||
Please consider to write Katib enhancement proposal if it is a large feature request.
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: rationale
|
||||
attributes:
|
||||
label: Why is this needed?
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
id: votes
|
||||
attributes:
|
||||
label: Love this feature?
|
||||
value: Give it a 👍 We prioritize the features with most 👍
|
|
@ -1,6 +1,6 @@
|
|||
<!-- Thanks for sending a pull request! Here are some tips for you:
|
||||
1. If this is your first time, check our contributor guidelines https://www.kubeflow.org/docs/about/contributing
|
||||
2. To know more about Katib components, check developer guide https://github.com/kubeflow/katib/blob/master/CONTRIBUTING.md
|
||||
2. To know more about Katib components, check developer guide https://github.com/kubeflow/katib/blob/master/docs/developer-guide.md
|
||||
3. If you want *faster* PR reviews, check how: https://git.k8s.io/community/contributors/guide/pull-requests.md#best-practices-for-faster-reviews
|
||||
-->
|
||||
|
||||
|
|
|
@ -0,0 +1,20 @@
|
|||
# Configuration for stale probot https://probot.github.io/apps/stale/
|
||||
|
||||
# Number of days of inactivity before an issue becomes stale
|
||||
daysUntilStale: 90
|
||||
# Number of days of inactivity before a stale issue is closed
|
||||
daysUntilClose: 20
|
||||
# Issues with these labels will never be considered stale
|
||||
exemptLabels:
|
||||
- lifecycle/frozen
|
||||
# Label to use when marking an issue as stale
|
||||
staleLabel: lifecycle/stale
|
||||
# Comment to post when marking an issue as stale. Set to `false` to disable
|
||||
markComment: >
|
||||
This issue has been automatically marked as stale because it has not had
|
||||
recent activity. It will be closed if no further activity occurs. Thank you
|
||||
for your contributions.
|
||||
# Comment to post when closing a stale issue. Set to `false` to disable
|
||||
closeComment: >
|
||||
This issue has been automatically closed because it has not had recent
|
||||
activity. Please comment "/reopen" to reopen it.
|
|
@ -1,81 +0,0 @@
|
|||
# Reusable workflows for publishing Katib images.
|
||||
name: Build and Publish Images
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
component-name:
|
||||
required: true
|
||||
type: string
|
||||
platforms:
|
||||
required: true
|
||||
type: string
|
||||
dockerfile:
|
||||
required: true
|
||||
type: string
|
||||
secrets:
|
||||
DOCKERHUB_USERNAME:
|
||||
required: false
|
||||
DOCKERHUB_TOKEN:
|
||||
required: false
|
||||
|
||||
jobs:
|
||||
build-and-publish:
|
||||
name: Build and Publish Images
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set Publish Condition
|
||||
id: publish-condition
|
||||
shell: bash
|
||||
run: |
|
||||
if [[ "${{ github.repository }}" == 'kubeflow/katib' && \
|
||||
( "${{ github.ref }}" == 'refs/heads/master' || \
|
||||
"${{ github.ref }}" =~ ^refs/heads/release- || \
|
||||
"${{ github.ref }}" =~ ^refs/tags/v ) ]]; then
|
||||
echo "should_publish=true" >> $GITHUB_OUTPUT
|
||||
else
|
||||
echo "should_publish=false" >> $GITHUB_OUTPUT
|
||||
fi
|
||||
|
||||
- name: GHCR Login
|
||||
if: steps.publish-condition.outputs.should_publish == 'true'
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: ${{ github.actor }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: DockerHub Login
|
||||
if: steps.publish-condition.outputs.should_publish == 'true'
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: docker.io
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
|
||||
- name: Publish Component ${{ inputs.component-name }}
|
||||
if: steps.publish-condition.outputs.should_publish == 'true'
|
||||
id: publish
|
||||
uses: ./.github/workflows/template-publish-image
|
||||
with:
|
||||
image: |
|
||||
ghcr.io/kubeflow/katib/${{ inputs.component-name }}
|
||||
docker.io/kubeflowkatib/${{ inputs.component-name }}
|
||||
dockerfile: ${{ inputs.dockerfile }}
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: true
|
||||
|
||||
- name: Test Build For Component ${{ inputs.component-name }}
|
||||
if: steps.publish.outcome == 'skipped'
|
||||
uses: ./.github/workflows/template-publish-image
|
||||
with:
|
||||
image: |
|
||||
ghcr.io/kubeflow/katib/${{ inputs.component-name }}
|
||||
docker.io/kubeflowkatib/${{ inputs.component-name }}
|
||||
dockerfile: ${{ inputs.dockerfile }}
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: false
|
|
@ -1,27 +1,22 @@
|
|||
name: E2E Test with darts-cnn-cifar10
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
- pull_request
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-20.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
with:
|
||||
kubernetes-version: ${{ matrix.kubernetes-version }}
|
||||
python-version: "3.11"
|
||||
|
||||
- name: Run e2e test with ${{ matrix.experiments }} experiments
|
||||
uses: ./.github/workflows/template-e2e-test
|
||||
|
@ -33,6 +28,8 @@ jobs:
|
|||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
|
||||
# TODO (tenzen-y): We need to consider running tests on more kubernetes versions.
|
||||
# kubernetes-version: ["v1.20.15", "v1.21.13", "v1.22.10", "v1.23.7", "v1.24.1"]
|
||||
kubernetes-version: ["v1.21.13", "v1.22.10", "v1.23.7"]
|
||||
# Comma Delimited
|
||||
experiments: ["darts-cpu"]
|
|
@ -1,40 +0,0 @@
|
|||
name: E2E Test with tune API
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-22.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
with:
|
||||
kubernetes-version: ${{ matrix.kubernetes-version }}
|
||||
|
||||
- name: Install Katib SDK with extra requires
|
||||
shell: bash
|
||||
run: |
|
||||
pip install --prefer-binary -e 'sdk/python/v1beta1[huggingface]'
|
||||
|
||||
- name: Run e2e test with tune API
|
||||
uses: ./.github/workflows/template-e2e-test
|
||||
with:
|
||||
tune-api: true
|
||||
training-operator: true
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
# Detail: https://hub.docker.com/r/kindest/node
|
||||
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
|
|
@ -1,35 +0,0 @@
|
|||
name: E2E Test with Katib UI, random search, and postgres
|
||||
|
||||
on:
|
||||
- pull_request
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-22.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
with:
|
||||
kubernetes-version: ${{ matrix.kubernetes-version }}
|
||||
|
||||
- name: Run e2e test with ${{ matrix.experiments }} experiments
|
||||
uses: ./.github/workflows/template-e2e-test
|
||||
with:
|
||||
experiments: random
|
||||
# Comma Delimited
|
||||
trial-images: pytorch-mnist-cpu
|
||||
katib-ui: true
|
||||
database-type: postgres
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
|
|
@ -1,27 +1,22 @@
|
|||
name: E2E Test with enas-cnn-cifar10
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
- pull_request
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-20.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
with:
|
||||
kubernetes-version: ${{ matrix.kubernetes-version }}
|
||||
python-version: "3.8"
|
||||
|
||||
- name: Run e2e test with ${{ matrix.experiments }} experiments
|
||||
uses: ./.github/workflows/template-e2e-test
|
||||
|
@ -33,6 +28,8 @@ jobs:
|
|||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
|
||||
# TODO (tenzen-y): We need to consider running tests on more kubernetes versions.
|
||||
# kubernetes-version: ["v1.20.15", "v1.21.13", "v1.22.10", "v1.23.7", "v1.24.1"]
|
||||
kubernetes-version: ["v1.21.13", "v1.22.10", "v1.23.7"]
|
||||
# Comma Delimited
|
||||
experiments: ["enas-cpu"]
|
|
@ -1,49 +0,0 @@
|
|||
name: Free-Up Disk Space
|
||||
description: Remove Non-Essential Tools And Move Docker Data Directory to /mnt/docker
|
||||
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
# This step is a Workaround to avoid the "No space left on device" error.
|
||||
# ref: https://github.com/actions/runner-images/issues/2840
|
||||
- name: Remove unnecessary files
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Disk usage before cleanup:"
|
||||
df -hT
|
||||
|
||||
sudo rm -rf /usr/share/dotnet
|
||||
sudo rm -rf /opt/ghc
|
||||
sudo rm -rf /usr/local/share/boost
|
||||
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
|
||||
sudo rm -rf /usr/local/lib/android
|
||||
sudo rm -rf /usr/local/share/powershell
|
||||
sudo rm -rf /usr/share/swift
|
||||
|
||||
echo "Disk usage after cleanup:"
|
||||
df -hT
|
||||
|
||||
- name: Prune docker images
|
||||
shell: bash
|
||||
run: |
|
||||
docker image prune -a -f
|
||||
docker system df
|
||||
df -hT
|
||||
|
||||
- name: Move docker data directory
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Stopping docker service ..."
|
||||
sudo systemctl stop docker
|
||||
DOCKER_DEFAULT_ROOT_DIR=/var/lib/docker
|
||||
DOCKER_ROOT_DIR=/mnt/docker
|
||||
echo "Moving ${DOCKER_DEFAULT_ROOT_DIR} -> ${DOCKER_ROOT_DIR}"
|
||||
sudo mv ${DOCKER_DEFAULT_ROOT_DIR} ${DOCKER_ROOT_DIR}
|
||||
echo "Creating symlink ${DOCKER_DEFAULT_ROOT_DIR} -> ${DOCKER_ROOT_DIR}"
|
||||
sudo ln -s ${DOCKER_ROOT_DIR} ${DOCKER_DEFAULT_ROOT_DIR}
|
||||
echo "$(sudo ls -l ${DOCKER_DEFAULT_ROOT_DIR})"
|
||||
echo "Starting docker service ..."
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl start docker
|
||||
echo "Docker service status:"
|
||||
sudo systemctl --no-pager -l -o short status docker
|
|
@ -0,0 +1,32 @@
|
|||
name: E2E Test for katib-ui
|
||||
on:
|
||||
- pull_request
|
||||
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-20.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
with:
|
||||
kubernetes-version: ${{ matrix.kubernetes-version }}
|
||||
|
||||
- name: Set Up Minikube Cluster
|
||||
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh true
|
||||
|
||||
- name: Start Katib
|
||||
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-katib.sh true false
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
# TODO (tenzen-y): We need to consider running tests on more kubernetes versions.
|
||||
# kubernetes-version: ["v1.20.15", "v1.21.13", "v1.22.10", "v1.23.7", "v1.24.1"]
|
||||
kubernetes-version: ["v1.21.13", "v1.22.10", "v1.23.7"]
|
|
@ -0,0 +1,23 @@
|
|||
name: Lint YAML files
|
||||
|
||||
on:
|
||||
- push
|
||||
- pull_request
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
name: Lint
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: 3.9
|
||||
|
||||
- name: Check YAML
|
||||
run: make yamllint
|
||||
|
|
@ -0,0 +1,41 @@
|
|||
name: E2E Test with mxnet-mnist
|
||||
on:
|
||||
- pull_request
|
||||
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-20.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
with:
|
||||
kubernetes-version: ${{ matrix.kubernetes-version }}
|
||||
|
||||
- name: Run e2e test with ${{ matrix.experiments }} experiments
|
||||
uses: ./.github/workflows/template-e2e-test
|
||||
with:
|
||||
experiments: ${{ matrix.experiments }}
|
||||
# Comma Delimited
|
||||
trial-images: mxnet-mnist
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
# TODO (tenzen-y): We need to consider running tests on more kubernetes versions.
|
||||
# kubernetes-version: ["v1.20.15", "v1.21.13", "v1.22.10", "v1.23.7", "v1.24.1"]
|
||||
kubernetes-version: ["v1.21.13", "v1.22.10", "v1.23.7"]
|
||||
# Comma Delimited
|
||||
experiments:
|
||||
# suggestion-hyperopt
|
||||
- "random,tpe,never-resume"
|
||||
- "median-stop,from-volume-resume"
|
||||
# others
|
||||
- "grid,bayesian-optimization,tpe"
|
||||
- "multivariate-tpe,cma-es,hyperband"
|
|
@ -2,21 +2,28 @@ name: Publish AutoML Algorithm Images
|
|||
|
||||
on:
|
||||
push:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
branches:
|
||||
- master
|
||||
|
||||
env:
|
||||
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
algorithm:
|
||||
name: Publish Image
|
||||
uses: ./.github/workflows/build-and-publish-images.yaml
|
||||
with:
|
||||
component-name: ${{ matrix.component-name }}
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: ${{ matrix.dockerfile }}
|
||||
secrets:
|
||||
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
# Trigger workflow only for kubeflow/katib repository.
|
||||
if: github.repository == 'kubeflow/katib'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Publish Component ${{ matrix.component-name }}
|
||||
uses: ./.github/workflows/template-publish-image
|
||||
with:
|
||||
image: docker.io/kubeflowkatib/${{ matrix.component-name }}
|
||||
dockerfile: ${{ matrix.dockerfile }}
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
|
@ -24,6 +31,8 @@ jobs:
|
|||
include:
|
||||
- component-name: suggestion-hyperopt
|
||||
dockerfile: cmd/suggestion/hyperopt/v1beta1/Dockerfile
|
||||
- component-name: suggestion-chocolate
|
||||
dockerfile: cmd/suggestion/chocolate/v1beta1/Dockerfile
|
||||
- component-name: suggestion-hyperband
|
||||
dockerfile: cmd/suggestion/hyperband/v1beta1/Dockerfile
|
||||
- component-name: suggestion-skopt
|
||||
|
|
|
@ -1,24 +0,0 @@
|
|||
name: Publish Katib Conformance Test Images
|
||||
|
||||
on:
|
||||
- push
|
||||
- pull_request
|
||||
|
||||
jobs:
|
||||
core:
|
||||
name: Publish Image
|
||||
uses: ./.github/workflows/build-and-publish-images.yaml
|
||||
with:
|
||||
component-name: ${{ matrix.component-name }}
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: ${{ matrix.dockerfile }}
|
||||
secrets:
|
||||
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- component-name: katib-conformance
|
||||
dockerfile: Dockerfile.conformance
|
|
@ -1,20 +1,29 @@
|
|||
name: Publish Katib Core Images
|
||||
|
||||
on:
|
||||
- push
|
||||
- pull_request
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
|
||||
env:
|
||||
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
core:
|
||||
name: Publish Image
|
||||
uses: ./.github/workflows/build-and-publish-images.yaml
|
||||
with:
|
||||
component-name: ${{ matrix.component-name }}
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: ${{ matrix.dockerfile }}
|
||||
secrets:
|
||||
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
# Trigger workflow only for kubeflow/katib repository.
|
||||
if: github.repository == 'kubeflow/katib'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Publish Component ${{ matrix.component-name }}
|
||||
uses: ./.github/workflows/template-publish-image
|
||||
with:
|
||||
image: docker.io/kubeflowkatib/${{ matrix.component-name }}
|
||||
dockerfile: ${{ matrix.dockerfile }}
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
|
@ -25,7 +34,9 @@ jobs:
|
|||
- component-name: katib-db-manager
|
||||
dockerfile: cmd/db-manager/v1beta1/Dockerfile
|
||||
- component-name: katib-ui
|
||||
dockerfile: cmd/ui/v1beta1/Dockerfile
|
||||
dockerfile: cmd/new-ui/v1beta1/Dockerfile
|
||||
- component-name: cert-generator
|
||||
dockerfile: cmd/cert-generator/v1beta1/Dockerfile
|
||||
- component-name: file-metrics-collector
|
||||
dockerfile: cmd/metricscollector/v1beta1/file-metricscollector/Dockerfile
|
||||
- component-name: tfevent-metrics-collector
|
||||
|
|
|
@ -2,47 +2,48 @@ name: Publish Trial Images
|
|||
|
||||
on:
|
||||
push:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
branches:
|
||||
- master
|
||||
|
||||
env:
|
||||
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
trial:
|
||||
name: Publish Image
|
||||
uses: ./.github/workflows/build-and-publish-images.yaml
|
||||
with:
|
||||
component-name: ${{ matrix.trial-name }}
|
||||
platforms: ${{ matrix.platforms }}
|
||||
dockerfile: ${{ matrix.dockerfile }}
|
||||
secrets:
|
||||
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
# Trigger workflow only for kubeflow/katib repository.
|
||||
if: github.repository == 'kubeflow/katib'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Publish Trial ${{ matrix.trial-name }}
|
||||
uses: ./.github/workflows/template-publish-image
|
||||
with:
|
||||
image: docker.io/kubeflowkatib/${{ matrix.trial-name }}
|
||||
dockerfile: ${{ matrix.dockerfile }}
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- trial-name: mxnet-mnist
|
||||
dockerfile: examples/v1beta1/trial-images/mxnet-mnist/Dockerfile
|
||||
- trial-name: pytorch-mnist-cpu
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: examples/v1beta1/trial-images/pytorch-mnist/Dockerfile.cpu
|
||||
- trial-name: pytorch-mnist-gpu
|
||||
platforms: linux/amd64
|
||||
dockerfile: examples/v1beta1/trial-images/pytorch-mnist/Dockerfile.gpu
|
||||
- trial-name: tf-mnist-with-summaries
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: examples/v1beta1/trial-images/tf-mnist-with-summaries/Dockerfile
|
||||
- trial-name: enas-cnn-cifar10-gpu
|
||||
platforms: linux/amd64
|
||||
dockerfile: examples/v1beta1/trial-images/enas-cnn-cifar10/Dockerfile.gpu
|
||||
- trial-name: enas-cnn-cifar10-cpu
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: examples/v1beta1/trial-images/enas-cnn-cifar10/Dockerfile.cpu
|
||||
- trial-name: darts-cnn-cifar10-cpu
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: examples/v1beta1/trial-images/darts-cnn-cifar10/Dockerfile.cpu
|
||||
- trial-name: darts-cnn-cifar10-gpu
|
||||
platforms: linux/amd64
|
||||
dockerfile: examples/v1beta1/trial-images/darts-cnn-cifar10/Dockerfile.gpu
|
||||
- trial-name: simple-pbt
|
||||
platforms: linux/amd64,linux/arm64
|
||||
dockerfile: examples/v1beta1/trial-images/simple-pbt/Dockerfile
|
||||
|
|
|
@ -1,27 +1,22 @@
|
|||
name: E2E Test with pytorch-mnist
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
- pull_request
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-20.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
with:
|
||||
kubernetes-version: ${{ matrix.kubernetes-version }}
|
||||
python-version: "3.10"
|
||||
|
||||
- name: Run e2e test with ${{ matrix.experiments }} experiments
|
||||
uses: ./.github/workflows/template-e2e-test
|
||||
|
@ -34,13 +29,10 @@ jobs:
|
|||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
|
||||
# TODO (tenzen-y): We need to consider running tests on more kubernetes versions.
|
||||
# kubernetes-version: ["v1.20.15", "v1.21.13", "v1.22.10", "v1.23.7", "v1.24.1"]
|
||||
kubernetes-version: ["v1.21.13", "v1.22.10", "v1.23.7"]
|
||||
# Comma Delimited
|
||||
experiments:
|
||||
# suggestion-hyperopt
|
||||
- "long-running-resume,from-volume-resume,median-stop"
|
||||
# others
|
||||
- "grid,bayesian-optimization,tpe,multivariate-tpe,cma-es,hyperband"
|
||||
- "hyperopt-distribution,optuna-distribution"
|
||||
- "file-metrics-collector,pytorchjob-mnist"
|
||||
- "median-stop-with-json-format,file-metrics-collector-with-json-format"
|
|
@ -1,21 +1,17 @@
|
|||
name: E2E Test with simple-pbt
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
- pull_request
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-20.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
|
@ -33,6 +29,8 @@ jobs:
|
|||
fail-fast: false
|
||||
matrix:
|
||||
# Detail: https://hub.docker.com/r/kindest/node
|
||||
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
|
||||
# TODO (tenzen-y): We need to consider running tests on more kubernetes versions.
|
||||
# kubernetes-version: ["v1.20.15", "v1.21.12", "v1.22.9", "v1.23.6", "v1.24.1"]
|
||||
kubernetes-version: ["v1.21.12", "v1.22.9", "v1.23.6"]
|
||||
# Comma Delimited
|
||||
experiments: ["simple-pbt"]
|
|
@ -1,42 +0,0 @@
|
|||
# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
|
||||
#
|
||||
# You can adjust the behavior by modifying this file.
|
||||
# For more information, see:
|
||||
# https://github.com/actions/stale
|
||||
name: Mark stale issues and pull requests
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 */5 * * *"
|
||||
|
||||
jobs:
|
||||
stale:
|
||||
runs-on: ubuntu-22.04
|
||||
permissions:
|
||||
issues: write
|
||||
pull-requests: write
|
||||
|
||||
steps:
|
||||
- uses: actions/stale@v5
|
||||
with:
|
||||
repo-token: ${{ secrets.GITHUB_TOKEN }}
|
||||
days-before-stale: 90
|
||||
days-before-close: 20
|
||||
stale-issue-message: >
|
||||
This issue has been automatically marked as stale because it has not had
|
||||
recent activity. It will be closed if no further activity occurs. Thank you
|
||||
for your contributions.
|
||||
close-issue-message: >
|
||||
This issue has been automatically closed because it has not had recent
|
||||
activity. Please comment "/reopen" to reopen it.
|
||||
stale-issue-label: lifecycle/stale
|
||||
exempt-issue-labels: lifecycle/frozen
|
||||
stale-pr-message: >
|
||||
This pull request has been automatically marked as stale because it has not had
|
||||
recent activity. It will be closed if no further activity occurs. Thank you
|
||||
for your contributions.
|
||||
close-pr-message: >
|
||||
This pull request has been automatically closed because it has not had recent
|
||||
activity. Please comment "/reopen" to reopen it.
|
||||
stale-pr-label: lifecycle/stale
|
||||
exempt-pr-labels: lifecycle/frozen
|
|
@ -1,49 +1,31 @@
|
|||
# Composite action for e2e tests.
|
||||
name: Run E2E Test
|
||||
description: Run e2e test using the minikube cluster
|
||||
# Template for e2e tests.
|
||||
|
||||
inputs:
|
||||
experiments:
|
||||
required: false
|
||||
description: comma delimited experiment name
|
||||
default: ""
|
||||
required: true
|
||||
type: string
|
||||
training-operator:
|
||||
required: false
|
||||
description: whether to deploy training-operator or not
|
||||
default: false
|
||||
type: boolean
|
||||
trial-images:
|
||||
required: false
|
||||
description: comma delimited trial image name
|
||||
default: ""
|
||||
required: true
|
||||
type: string
|
||||
katib-ui:
|
||||
required: true
|
||||
description: whether to deploy katib-ui or not
|
||||
default: false
|
||||
database-type:
|
||||
required: false
|
||||
description: mysql or postgres
|
||||
default: mysql
|
||||
tune-api:
|
||||
required: true
|
||||
description: whether to execute tune-api test or not
|
||||
type: boolean
|
||||
default: false
|
||||
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
- name: Setup Minikube Cluster
|
||||
- name: Set Up Minikube Cluster
|
||||
shell: bash
|
||||
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh ${{ inputs.katib-ui }} ${{ inputs.tune-api }} ${{ inputs.trial-images }} ${{ inputs.experiments }}
|
||||
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh ${{ inputs.katib-ui }} ${{ inputs.trial-images }} ${{ inputs.experiments }}
|
||||
|
||||
- name: Setup Katib
|
||||
- name: Set Up Katib
|
||||
shell: bash
|
||||
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-katib.sh ${{ inputs.katib-ui }} ${{ inputs.training-operator }} ${{ inputs.database-type }}
|
||||
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-katib.sh ${{ inputs.katib-ui }} ${{ inputs.training-operator }}
|
||||
|
||||
- name: Run E2E Experiment
|
||||
shell: bash
|
||||
run: |
|
||||
if "${{ inputs.tune-api }}"; then
|
||||
./test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.sh
|
||||
else
|
||||
./test/e2e/v1beta1/scripts/gh-actions/run-e2e-experiment.sh ${{ inputs.experiments }}
|
||||
fi
|
||||
run: ./test/e2e/v1beta1/scripts/gh-actions/run-e2e-experiment.sh ${{ inputs.experiments }}
|
||||
|
|
|
@ -1,49 +1,28 @@
|
|||
# Composite action for publishing Katib images.
|
||||
name: Build And Publish Container Images
|
||||
description: Build MultiPlatform Supporting Container Images
|
||||
# Template run for publishing Katib images.
|
||||
|
||||
inputs:
|
||||
image:
|
||||
required: true
|
||||
description: image tag
|
||||
type: string
|
||||
dockerfile:
|
||||
required: true
|
||||
description: path for dockerfile
|
||||
platforms:
|
||||
required: true
|
||||
description: linux/amd64 or linux/amd64,linux/arm64
|
||||
push:
|
||||
required: true
|
||||
description: whether to push container images or not
|
||||
type: string
|
||||
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
# This step is a Workaround to avoid the "No space left on device" error.
|
||||
# ref: https://github.com/actions/runner-images/issues/2840
|
||||
- name: Remove unnecessary files
|
||||
shell: bash
|
||||
run: |
|
||||
sudo rm -rf /usr/share/dotnet
|
||||
sudo rm -rf /opt/ghc
|
||||
sudo rm -rf "/usr/local/share/boost"
|
||||
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
|
||||
sudo rm -rf /usr/local/lib/android
|
||||
sudo rm -rf /usr/local/share/powershell
|
||||
sudo rm -rf /usr/share/swift
|
||||
|
||||
echo "Disk usage after cleanup:"
|
||||
df -h
|
||||
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@v3
|
||||
|
||||
- name: Set Up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
uses: docker/setup-buildx-action@v1
|
||||
|
||||
- name: Docker Login
|
||||
uses: docker/login-action@v1
|
||||
with:
|
||||
username: ${{ env.DOCKERHUB_USERNAME }}
|
||||
password: ${{ env.DOCKERHUB_TOKEN }}
|
||||
|
||||
- name: Add Docker Tags
|
||||
id: meta
|
||||
uses: docker/metadata-action@v5
|
||||
uses: docker/metadata-action@v3
|
||||
with:
|
||||
images: ${{ inputs.image }}
|
||||
tags: |
|
||||
|
@ -51,12 +30,11 @@ runs:
|
|||
type=sha,prefix=v1beta1-
|
||||
|
||||
- name: Build and Push
|
||||
uses: docker/build-push-action@v5
|
||||
uses: docker/build-push-action@v2
|
||||
with:
|
||||
context: .
|
||||
file: ${{ inputs.dockerfile }}
|
||||
push: ${{ inputs.push }}
|
||||
push: true
|
||||
tags: ${{ steps.meta.outputs.tags }}
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max,ignore-error=true
|
||||
platforms: ${{ inputs.platforms }}
|
||||
cache-to: type=gha,mode=max
|
||||
|
|
|
@ -1,48 +1,25 @@
|
|||
# Composite action to setup e2e tests.
|
||||
name: Setup E2E Test
|
||||
description: setup env for e2e test using the minikube cluster
|
||||
# Template for e2e tests.
|
||||
|
||||
inputs:
|
||||
kubernetes-version:
|
||||
required: true
|
||||
description: kubernetes version
|
||||
python-version:
|
||||
required: false
|
||||
description: Python version
|
||||
# Most latest supporting version
|
||||
default: "3.10"
|
||||
type: string
|
||||
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
# This step is a Workaround to avoid the "No space left on device" error.
|
||||
# ref: https://github.com/actions/runner-images/issues/2840
|
||||
- name: Free-Up Disk Space
|
||||
uses: ./.github/workflows/free-up-disk-space
|
||||
|
||||
- name: Setup kubectl
|
||||
uses: azure/setup-kubectl@v4
|
||||
- name: Set Up Minikube Cluster
|
||||
uses: manusa/actions-setup-minikube@v2.6.0
|
||||
with:
|
||||
version: ${{ inputs.kubernetes-version }}
|
||||
minikube version: "v1.25.2"
|
||||
kubernetes version: ${{ inputs.kubernetes-version }}
|
||||
start args: --driver none --wait-timeout=60s
|
||||
github token: ${{ env.GITHUB_TOKEN }}
|
||||
|
||||
- name: Setup Minikube Cluster
|
||||
uses: medyagh/setup-minikube@v0.0.18
|
||||
- name: Set Up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v1
|
||||
|
||||
- name: Set Up Go env
|
||||
uses: actions/setup-go@v2
|
||||
with:
|
||||
network-plugin: cni
|
||||
cni: flannel
|
||||
driver: none
|
||||
kubernetes-version: ${{ inputs.kubernetes-version }}
|
||||
minikube-version: 1.34.0
|
||||
start-args: --wait-timeout=120s
|
||||
|
||||
- name: Setup Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
|
||||
- name: Install Katib SDK
|
||||
shell: bash
|
||||
run: pip install --prefer-binary -e sdk/python/v1beta1
|
||||
go-version: 1.17.10
|
||||
|
|
|
@ -0,0 +1,118 @@
|
|||
name: Charmed Katib
|
||||
|
||||
on:
|
||||
- push
|
||||
- pull_request
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
name: Lint
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
set -eux
|
||||
sudo apt update
|
||||
sudo apt install python3-setuptools
|
||||
sudo pip3 install black flake8
|
||||
|
||||
- name: Check black
|
||||
run: black --check operators/*/src
|
||||
|
||||
- name: Check flake8
|
||||
run: cd operators && flake8 ./katib*/src
|
||||
|
||||
build:
|
||||
name: Test
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out repo
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- uses: balchua/microk8s-actions@v0.2.2
|
||||
with:
|
||||
channel: "1.21/stable"
|
||||
addons: '["dns", "storage", "rbac"]'
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
set -eux
|
||||
sudo apt update
|
||||
sudo apt install -y python3-pip
|
||||
sudo snap install juju --classic
|
||||
sudo snap install juju-bundle --classic
|
||||
sudo snap install juju-wait --classic
|
||||
sudo pip3 install charmcraft==1.3.1
|
||||
|
||||
- name: Build Docker images
|
||||
run: |
|
||||
set -eux
|
||||
images=("katib-controller" "katib-ui" "katib-db-manager")
|
||||
folders=("katib-controller" "ui" "db-manager")
|
||||
for idx in {0..2}; do
|
||||
docker build . \
|
||||
-t docker.io/kubeflowkatib/${images[$idx]}:latest \
|
||||
-f cmd/${folders[$idx]}/v1beta1/Dockerfile
|
||||
docker save docker.io/kubeflowkatib/${images[$idx]} > ${images[$idx]}.tar
|
||||
microk8s ctr image import ${images[$idx]}.tar
|
||||
done
|
||||
|
||||
- name: Deploy Katib
|
||||
env:
|
||||
CHARMCRAFT_DEVELOPER: "1"
|
||||
run: |
|
||||
set -eux
|
||||
cd operators/
|
||||
git clone git://git.launchpad.net/canonical-osm
|
||||
cp -r canonical-osm/charms/interfaces/juju-relation-mysql mysql
|
||||
sg microk8s -c 'juju bootstrap microk8s uk8s'
|
||||
juju add-model kubeflow
|
||||
juju bundle deploy --build --destructive-mode --serial
|
||||
juju wait -wvt 600
|
||||
|
||||
- name: Test Katib
|
||||
run: kubectl apply -f examples/v1beta1/hp-tuning/random.yaml
|
||||
|
||||
- name: Get pod statuses
|
||||
run: kubectl get all -A
|
||||
if: failure()
|
||||
|
||||
- name: Get juju status
|
||||
run: juju status
|
||||
if: failure()
|
||||
|
||||
- name: Get katib-controller workload logs
|
||||
run: kubectl logs --tail 100 -nkubeflow -lapp.kubernetes.io/name=katib-controller
|
||||
if: failure()
|
||||
|
||||
- name: Get katib-controller operator logs
|
||||
run: kubectl logs --tail 100 -nkubeflow -loperator.juju.is/name=katib-controller
|
||||
if: failure()
|
||||
|
||||
- name: Get katib-ui workload logs
|
||||
run: kubectl logs --tail 100 -nkubeflow -lapp.kubernetes.io/name=katib-ui
|
||||
if: failure()
|
||||
|
||||
- name: Get katib-ui operator logs
|
||||
run: kubectl logs --tail 100 -nkubeflow -loperator.juju.is/name=katib-ui
|
||||
if: failure()
|
||||
|
||||
- name: Get katib-db-manager workload logs
|
||||
run: kubectl logs --tail 100 -nkubeflow -lapp.kubernetes.io/name=katib-db-manager
|
||||
if: failure()
|
||||
|
||||
- name: Get katib-db-manager operator logs
|
||||
run: kubectl logs --tail 100 -nkubeflow -loperator.juju.is/name=katib-db-manager
|
||||
if: failure()
|
||||
|
||||
- name: Upload charmcraft logs
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: charmcraft-logs
|
||||
path: /tmp/charmcraft-log-*
|
||||
if: failure()
|
|
@ -1,18 +1,13 @@
|
|||
name: Go Test
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
- push
|
||||
- pull_request
|
||||
|
||||
jobs:
|
||||
generatetests:
|
||||
name: Generate And Format Test
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-latest
|
||||
env:
|
||||
GOPATH: ${{ github.workspace }}/go
|
||||
defaults:
|
||||
|
@ -20,22 +15,32 @@ jobs:
|
|||
working-directory: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
with:
|
||||
path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
|
||||
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
uses: actions/setup-go@v2
|
||||
with:
|
||||
go-version-file: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.mod
|
||||
cache-dependency-path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.sum
|
||||
go-version: 1.17.10
|
||||
|
||||
- name: Check Go Modules, Generated Go/Python codes, and Format
|
||||
run: make check
|
||||
# Verify that go.mod and go.sum is synchronized
|
||||
- name: Check Go modules
|
||||
run: |
|
||||
go mod tidy &&
|
||||
git add go.* &&
|
||||
git diff --cached --exit-code || (echo 'Please run "go mod tidy" to sync Go modules' && exit 1)
|
||||
|
||||
- name: Run Generate And Go Format Test
|
||||
run: |
|
||||
go mod download &&
|
||||
make check &&
|
||||
git add pkg/apis hack/gen-python-sdk &&
|
||||
git diff --cached --exit-code || (echo 'Please run "make check" to generate codes and to format Go codes' && exit 1)
|
||||
|
||||
unittests:
|
||||
name: Unit Test
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-latest
|
||||
env:
|
||||
GOPATH: ${{ github.workspace }}/go
|
||||
defaults:
|
||||
|
@ -43,15 +48,14 @@ jobs:
|
|||
working-directory: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
with:
|
||||
path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
|
||||
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
uses: actions/setup-go@v2
|
||||
with:
|
||||
go-version-file: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.mod
|
||||
cache-dependency-path: ${{ env.GOPATH }}/src/github.com/kubeflow/katib/go.sum
|
||||
go-version: 1.17.10
|
||||
|
||||
- name: Run Go test
|
||||
run: go mod download && make test ENVTEST_K8S_VERSION=${{ matrix.kubernetes-version }}
|
||||
|
@ -61,19 +65,9 @@ jobs:
|
|||
with:
|
||||
path-to-profile: coverage.out
|
||||
working-directory: ${{ env.GOPATH }}/src/github.com/kubeflow/katib
|
||||
parallel: true
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
# Detail: `setup-envtest list`
|
||||
kubernetes-version: ["1.29.3", "1.30.0", "1.31.0"]
|
||||
|
||||
# notifies that all test jobs are finished.
|
||||
finish:
|
||||
needs: unittests
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- uses: shogo82148/actions-goveralls@v1
|
||||
with:
|
||||
parallel-finished: true
|
||||
# Detail: `setup-envtest list --arch amd64`
|
||||
kubernetes-version: ["1.21.4", "1.22.1", "1.23.5"]
|
||||
|
|
|
@ -1,30 +0,0 @@
|
|||
name: Lint Files
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
name: Lint
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: 3.9
|
||||
|
||||
- name: Check shell scripts
|
||||
run: make shellcheck
|
||||
|
||||
- name: Run pre-commit
|
||||
uses: pre-commit/action@v3.0.1
|
|
@ -1,101 +1,24 @@
|
|||
name: Frontend Test
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths:
|
||||
- pkg/ui/v1beta1/frontend/**
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
- push
|
||||
- pull_request
|
||||
|
||||
jobs:
|
||||
test:
|
||||
name: Code format and lint
|
||||
runs-on: ubuntu-22.04
|
||||
name: Test
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Node
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v2
|
||||
with:
|
||||
node-version: 16.20.2
|
||||
node-version: 12.18.1
|
||||
|
||||
- name: Format katib code
|
||||
- name: Run Node test
|
||||
run: |
|
||||
npm install prettier --prefix ./pkg/ui/v1beta1/frontend
|
||||
npm install prettier --prefix ./pkg/new-ui/v1beta1/frontend
|
||||
make prettier-check
|
||||
|
||||
- name: Lint katib code
|
||||
run: |
|
||||
cd pkg/ui/v1beta1/frontend
|
||||
npm run lint-check
|
||||
|
||||
frontend-unit-tests:
|
||||
name: Frontend Unit Tests
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Node
|
||||
uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 16.20.2
|
||||
|
||||
- name: Fetch Kubeflow and install common code dependencies
|
||||
run: |
|
||||
COMMIT=$(cat pkg/ui/v1beta1/frontend/COMMIT)
|
||||
cd /tmp && git clone https://github.com/kubeflow/kubeflow.git
|
||||
cd kubeflow
|
||||
git checkout $COMMIT
|
||||
cd components/crud-web-apps/common/frontend/kubeflow-common-lib
|
||||
npm i
|
||||
npm run build
|
||||
npm link ./dist/kubeflow
|
||||
|
||||
- name: Install KWA dependencies
|
||||
run: |
|
||||
cd pkg/ui/v1beta1/frontend
|
||||
npm i
|
||||
npm link kubeflow
|
||||
|
||||
- name: Run unit tests
|
||||
run: |
|
||||
cd pkg/ui/v1beta1/frontend
|
||||
npm run test:prod
|
||||
|
||||
frontend-ui-tests:
|
||||
name: UI tests with Cypress
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
- name: Setup node version to 16
|
||||
uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 16
|
||||
|
||||
- name: Fetch Kubeflow and install common code dependencies
|
||||
run: |
|
||||
COMMIT=$(cat pkg/ui/v1beta1/frontend/COMMIT)
|
||||
cd /tmp && git clone https://github.com/kubeflow/kubeflow.git
|
||||
cd kubeflow
|
||||
git checkout $COMMIT
|
||||
cd components/crud-web-apps/common/frontend/kubeflow-common-lib
|
||||
npm i
|
||||
npm run build
|
||||
npm link ./dist/kubeflow
|
||||
- name: Install KWA dependencies
|
||||
run: |
|
||||
cd pkg/ui/v1beta1/frontend
|
||||
npm i
|
||||
npm link kubeflow
|
||||
- name: Serve UI & run Cypress tests in Chrome and Firefox
|
||||
run: |
|
||||
cd pkg/ui/v1beta1/frontend
|
||||
npm run start & npx wait-on http://localhost:4200
|
||||
npm run ui-test-ci-all
|
||||
|
|
|
@ -1,47 +1,22 @@
|
|||
name: Python Test
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
- push
|
||||
- pull_request
|
||||
|
||||
jobs:
|
||||
test:
|
||||
name: Test
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: 3.11
|
||||
|
||||
- name: Run Python test
|
||||
run: make pytest
|
||||
|
||||
# The skopt service doesn't work appropriately with Python 3.11.
|
||||
# So, we need to run the test with Python 3.9.
|
||||
# TODO (tenzen-y): Once we stop to support skopt, we can remove this test.
|
||||
# REF: https://github.com/kubeflow/katib/issues/2280
|
||||
test-skopt:
|
||||
name: Test Skopt
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v5
|
||||
uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: 3.9
|
||||
|
||||
- name: Run Python test
|
||||
run: make pytest-skopt
|
||||
run: make pytest
|
||||
|
|
|
@ -0,0 +1,17 @@
|
|||
name: Shellcheck
|
||||
|
||||
on:
|
||||
- push
|
||||
- pull_request
|
||||
|
||||
jobs:
|
||||
test:
|
||||
name: Test
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Run shellcheck
|
||||
run: make shellcheck
|
|
@ -1,21 +1,17 @@
|
|||
name: E2E Test with tf-mnist-with-summaries
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths-ignore:
|
||||
- "pkg/ui/v1beta1/frontend/**"
|
||||
- pull_request
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: ubuntu-22.04
|
||||
runs-on: ubuntu-20.04
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Setup Test Env
|
||||
uses: ./.github/workflows/template-setup-e2e-test
|
||||
|
@ -33,6 +29,8 @@ jobs:
|
|||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
kubernetes-version: ["v1.29.2", "v1.30.7", "v1.31.3"]
|
||||
# TODO (tenzen-y): We need to consider running tests on more kubernetes versions.
|
||||
# kubernetes-version: ["v1.20.15", "v1.21.13", "v1.22.10", "v1.23.7", "v1.24.1"]
|
||||
kubernetes-version: ["v1.21.13", "v1.22.10", "v1.23.7"]
|
||||
# Comma Delimited
|
||||
experiments: ["tfjob-mnist-with-summaries"]
|
|
@ -78,6 +78,3 @@ $RECYCLE.BIN/
|
|||
|
||||
## Vendor dir
|
||||
vendor
|
||||
|
||||
# Jupyter Notebooks.
|
||||
**/.ipynb_checkpoints
|
||||
|
|
|
@ -1,38 +0,0 @@
|
|||
repos:
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
rev: v2.3.0
|
||||
hooks:
|
||||
- id: check-yaml
|
||||
args: [--allow-multiple-documents]
|
||||
- id: check-json
|
||||
- repo: https://github.com/pycqa/isort
|
||||
rev: 5.11.5
|
||||
hooks:
|
||||
- id: isort
|
||||
name: isort
|
||||
entry: isort --profile black
|
||||
- repo: https://github.com/psf/black
|
||||
rev: 24.2.0
|
||||
hooks:
|
||||
- id: black
|
||||
files: (sdk|examples|pkg)/.*
|
||||
- repo: https://github.com/pycqa/flake8
|
||||
rev: 7.1.1
|
||||
hooks:
|
||||
- id: flake8
|
||||
files: (sdk|examples|pkg)/.*
|
||||
exclude: |
|
||||
(?x)^(
|
||||
.*zz_generated.deepcopy.*|
|
||||
.*pb.go|
|
||||
pkg/apis/manager/.*pb2(?:_grpc)?.py(?:i)?|
|
||||
pkg/apis/v1beta1/openapi_generated.go|
|
||||
pkg/mock/.*|
|
||||
pkg/client/controller/.*|
|
||||
sdk/python/v1beta1/kubeflow/katib/configuration.py|
|
||||
sdk/python/v1beta1/kubeflow/katib/rest.py|
|
||||
sdk/python/v1beta1/kubeflow/katib/__init__.py|
|
||||
sdk/python/v1beta1/kubeflow/katib/exceptions.py|
|
||||
sdk/python/v1beta1/kubeflow/katib/api_client.py|
|
||||
sdk/python/v1beta1/kubeflow/katib/models/.*
|
||||
)$
|
|
@ -11,10 +11,8 @@ Please keep the list in alphabetical order.
|
|||
| [babylon health](https://www.babylonhealth.com/) | [@jeremievallee](https://github.com/jeremievallee) | Hyperparameter tuning for AIR internal AI Platform |
|
||||
| [caicloud](https://caicloud.io/) | [@gaocegege](https://github.com/gaocegege) | Hyperparameter tuning in Caicloud Cloud-Native AI Platform |
|
||||
| [canonical](https://ubuntu.com/) | [@RFMVasconcelos](https://github.com/rfmvasconcelos) | Hyperparameter tuning for customer projects in Defense and Fintech |
|
||||
| [CERN](https://home.cern/) | [@d-gol](https://github.com/d-gol) | Hyperparameter tuning within the ML platform on private cloud |
|
||||
| [cisco](https://cisco.com/) | [@ramdootp](https://github.com/ramdootp) | Hyperparameter tuning for conversational AI interface using Rasa |
|
||||
| [cubonacci](https://www.cubonacci.com) | [@janvdvegt](https://github.com/janvdvegt) | Hyperparameter tuning within the Cubonacci machine learning platform |
|
||||
| [CyberAgent](https://www.cyberagent.co.jp/en/) | [@tenzen-y](https://github.com/tenzen-y) | Experiment in CyberAgent internal ML Platform on Private Cloud |
|
||||
| [fuzhi](http://www.fuzhi.ai/) | [@planck0591](https://github.com/planck0591) | Experiment and Trial in autoML Platform |
|
||||
| [karrot](https://uk.karrotmarket.com/) | [@muik](https://github.com/muik) | Hyperparameter tuning in Karrot ML Platform |
|
||||
| [PITS Global Data Recovery Services](https://www.pitsdatarecovery.net/) | [@pheianox](https://github.com/pheianox) | CyberAgent and ML Platform |
|
||||
|
|
849
CHANGELOG.md
849
CHANGELOG.md
|
@ -1,821 +1,6 @@
|
|||
# Changelog
|
||||
|
||||
# [v0.18.0](https://github.com/kubeflow/katib/tree/v0.18.0) (2025-03-25)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- Move Katib manifest image references to ghcr ([#2535](https://github.com/kubeflow/katib/pull/2535) by [@saileshd1402](https://github.com/saileshd1402))
|
||||
- Migrate docker images to ghcr ([#2531](https://github.com/kubeflow/katib/pull/2531) by [@mahdikhashan](https://github.com/mahdikhashan))
|
||||
- Upgrade Kubernetes to v1.31.3 ([#2478](https://github.com/kubeflow/katib/pull/2478) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Upgrade Kubernetes to v1.30.7 ([#2463](https://github.com/kubeflow/katib/pull/2463) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Drop Python 3.7 and Support Python 3.11 in the SDK ([#2337](https://github.com/kubeflow/katib/pull/2337) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## New Features
|
||||
|
||||
### Hyperparameter Optimization for LLMs
|
||||
|
||||
- [DOCS] move llm hyperparameter optimisation design image to the proposal directory and rename it ([#2472](https://github.com/kubeflow/katib/pull/2472) by [@mahdikhashan](https://github.com/mahdikhashan))
|
||||
- [GSoC] Update `tune` API for LLM hyperparameters optimization ([#2393](https://github.com/kubeflow/katib/pull/2393) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- [GSoC] Create LLM Hyperparameters Optimization API Proposal ([#2333](https://github.com/kubeflow/katib/pull/2333) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
|
||||
### Support for Advanced Distributions for HPO
|
||||
|
||||
- [GSOC] `optuna` suggestion service logic update ([#2446](https://github.com/kubeflow/katib/pull/2446) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] `hyperopt` suggestion service logic update ([#2412](https://github.com/kubeflow/katib/pull/2412) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] Add validator for feasible space distribution ([#2404](https://github.com/kubeflow/katib/pull/2404) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] added Unknown distribution and convertDistribution in suggestion client ([#2403](https://github.com/kubeflow/katib/pull/2403) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] Support for various Parameter distributions in Katib ([#2334](https://github.com/kubeflow/katib/pull/2334) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSoC] Added `DistributionType` to Experiment API ([#2377](https://github.com/kubeflow/katib/pull/2377) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
|
||||
### Push-based Metrics Collector
|
||||
|
||||
- [GSoC] Provide a PyTorch MNIST Example for Push-based Metrics Collection ([#2437](https://github.com/kubeflow/katib/pull/2437) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] Compatibility Changes in Trial Controller ([#2394](https://github.com/kubeflow/katib/pull/2394) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] New Interface `report_metrics` in Python SDK ([#2371](https://github.com/kubeflow/katib/pull/2371) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] KEP for Project 6: Push-based Metrics Collection for Katib ([#2328](https://github.com/kubeflow/katib/pull/2328) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] Add New Parameter in `tune` ([#2369](https://github.com/kubeflow/katib/pull/2369) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
|
||||
### SDK Updates
|
||||
|
||||
- [SDK] Support PyTorchJob as a Trial Worker ([#2512](https://github.com/kubeflow/katib/pull/2512) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] test: Add e2e test for tune function. ([#2399](https://github.com/kubeflow/katib/pull/2399) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [SDK] improve PVC creation name error ([#2496](https://github.com/kubeflow/katib/pull/2496) by [@mahdikhashan](https://github.com/mahdikhashan))
|
||||
- [SDK] Fix empty list for env variables and numpy version ([#2360](https://github.com/kubeflow/katib/pull/2360) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Explain Python version support cycle ([#2354](https://github.com/kubeflow/katib/pull/2354) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- fix(webhook): fix validation message in experiment webhook ([#2507](https://github.com/kubeflow/katib/pull/2507) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Install typing-extensions v4.10.0 to fix Python test error ([#2504](https://github.com/kubeflow/katib/pull/2504) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- [SDK] Update `tune` API ([#2497](https://github.com/kubeflow/katib/pull/2497) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- fix(api): resolve all api voilation exceptions in katib api ([#2482](https://github.com/kubeflow/katib/pull/2482) by [@truc0](https://github.com/truc0))
|
||||
- fix(trial): use propagated gomega to improve debuggability. ([#2432](https://github.com/kubeflow/katib/pull/2432) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- fix(ui): update None Collector with Push Collector. ([#2418](https://github.com/kubeflow/katib/pull/2418) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- fix: Resolve errors in e2e tests for cypress in Katib UI ([#2384](https://github.com/kubeflow/katib/pull/2384) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- doc(example): fix the broken link. ([#2433](https://github.com/kubeflow/katib/pull/2433) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- fix: remove remaining MXNet dependency. ([#2456](https://github.com/kubeflow/katib/pull/2456) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Remove Dropout layer from ENAS Trial container to fix E2E tests ([#2455](https://github.com/kubeflow/katib/pull/2455) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] fix grpc related bugs in Python SDK ([#2398](https://github.com/kubeflow/katib/pull/2398) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [SDK] Fix types error ([#2424](https://github.com/kubeflow/katib/pull/2424) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- fix: remove the dependency of `protocmp` in `google.golang.org/protobuf/testing/protocmp`. ([#2391](https://github.com/kubeflow/katib/pull/2391) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Fix TestReconcileBatchJob ([#2350](https://github.com/kubeflow/katib/pull/2350) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Fix apple silicon rosetta error when building images from the source code ([#2342](https://github.com/kubeflow/katib/pull/2342) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- fix katib use crds token pipeline trail template guide ([#2330](https://github.com/kubeflow/katib/pull/2330) by [@Jerry-yz](https://github.com/Jerry-yz))
|
||||
- Fix Scikit-Learn Version for Skopt Tests ([#2336](https://github.com/kubeflow/katib/pull/2336) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Misc
|
||||
|
||||
- Support old-style TensorFlow events (tensorboard) ([#2517](https://github.com/kubeflow/katib/pull/2517) by [@garymm](https://github.com/garymm))
|
||||
- Set experiment names at a max of 40 characters. ([#2468](https://github.com/kubeflow/katib/pull/2468) by [@AydanPirani](https://github.com/AydanPirani))
|
||||
- [CI] optimize katib ui dockerfile ([#2505](https://github.com/kubeflow/katib/pull/2505) by [@mahdikhashan](https://github.com/mahdikhashan))
|
||||
- Sort experiments by descending creation date by default in katib-ui ([#2498](https://github.com/kubeflow/katib/pull/2498) by [@Doris-xm](https://github.com/Doris-xm))
|
||||
- [GSoC] Add unit tests for `tune` API ([#2423](https://github.com/kubeflow/katib/pull/2423) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- Update MutatingWebhookConfiguration: Switch from objectSelector to AdmissionWebhookMatchConditions ([#2241](https://github.com/kubeflow/katib/pull/2241) by [@lianghao208](https://github.com/lianghao208))
|
||||
- chore: supporting the listen-address parameter on db-manager ([#2465](https://github.com/kubeflow/katib/pull/2465) by [@caiofralmeida](https://github.com/caiofralmeida))
|
||||
- Upgrade klog to v2 ([#2470](https://github.com/kubeflow/katib/pull/2470) by [@Doris-xm](https://github.com/Doris-xm))
|
||||
- Ignore cache exporting errors in the image building workflows ([#2487](https://github.com/kubeflow/katib/pull/2487) by [@Doris-xm](https://github.com/Doris-xm))
|
||||
- Upgrade grpcio version to v1.64.1 ([#2483](https://github.com/kubeflow/katib/pull/2483) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- docs: remove katib workflow ([#2443](https://github.com/kubeflow/katib/pull/2443) by [@gonmmarques](https://github.com/gonmmarques))
|
||||
- Migrate KatibCertGenerator to OPA CertController ([#2345](https://github.com/kubeflow/katib/pull/2345) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Promote @Electronic-Waste and @helenxie-bit as Katib reviewers ([#2439](https://github.com/kubeflow/katib/pull/2439) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update README and out-of-date docs ([#2438](https://github.com/kubeflow/katib/pull/2438) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Changes isort profile to black, to be fully compatible and adds 'pkg' dir for black and flake8 ([#2413](https://github.com/kubeflow/katib/pull/2413) by [@Ygnas](https://github.com/Ygnas))
|
||||
- Introduced error constants and replaced reflect with cmp ([#2289](https://github.com/kubeflow/katib/pull/2289) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- [Test] Refactor `inject_webhook_test.go` according to the Developer Guide ([#2401](https://github.com/kubeflow/katib/pull/2401) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Enhance pre-commit hooks with flake8 and black ([#2407](https://github.com/kubeflow/katib/pull/2407) by [@Ygnas](https://github.com/Ygnas))
|
||||
- added `Distribution` field to feasibleSpace in `api.proto` ([#2397](https://github.com/kubeflow/katib/pull/2397) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- Begin enabling pre-commit hooks ([#2242](https://github.com/kubeflow/katib/pull/2242) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
- Update Instructions for Argo Workflows ([#2382](https://github.com/kubeflow/katib/pull/2382) by [@jaffe-fly](https://github.com/jaffe-fly))
|
||||
- docs: update suggestion.md ([#2387](https://github.com/kubeflow/katib/pull/2387) by [@eltociear](https://github.com/eltociear))
|
||||
- Add command to re-run GitHub Actions tests ([#2385](https://github.com/kubeflow/katib/pull/2385) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Bump Katib Python SDK to 0.17.0 version ([#2379](https://github.com/kubeflow/katib/pull/2379) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.17.0 ([#2380](https://github.com/kubeflow/katib/pull/2380) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Replaced hpcloud with nxadm for tail package in Go ([#2375](https://github.com/kubeflow/katib/pull/2375) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- Use ErrorList for experiment validator ([#2329](https://github.com/kubeflow/katib/pull/2329) by [@ckcd](https://github.com/ckcd))
|
||||
- Add Changelog for Katib v0.17.0-rc.1 ([#2370](https://github.com/kubeflow/katib/pull/2370) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Remove default caBundle value ([#2368](https://github.com/kubeflow/katib/pull/2368) by [@vihangm](https://github.com/vihangm))
|
||||
- Bump Katib Python SDK to 0.17.0rc1 version ([#2365](https://github.com/kubeflow/katib/pull/2365) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add unit test for `create_experiment` in the `katib_client` module ([#2325](https://github.com/kubeflow/katib/pull/2325) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- Remove code generation from release script ([#2363](https://github.com/kubeflow/katib/pull/2363) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Upgrade the protobuf version to >=4.21.12,<5 ([#2358](https://github.com/kubeflow/katib/pull/2358) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Replace gRPC code generation tool from Znly/protoc to Buf ([#2344](https://github.com/kubeflow/katib/pull/2344) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Replace already closed github.com/golang/mock with go.uber.org/mock ([#2357](https://github.com/kubeflow/katib/pull/2357) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Use cache-dependency-path in actions/setup-go for CI workflow ([#2355](https://github.com/kubeflow/katib/pull/2355) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Update Slack Invitation ([#2349](https://github.com/kubeflow/katib/pull/2349) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update GitHub template to better triage Issues ([#2335](https://github.com/kubeflow/katib/pull/2335) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.17.0-rc.0 ([#2319](https://github.com/kubeflow/katib/pull/2319) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update outdated actions ([#2324](https://github.com/kubeflow/katib/pull/2324) by [@Mersho](https://github.com/Mersho))
|
||||
- Make test fields private in Go unit tests ([#2316](https://github.com/kubeflow/katib/pull/2316) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- Bump Katib Python SDK to 0.17.0rc0 Version ([#2318](https://github.com/kubeflow/katib/pull/2318) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.17.0...v0.18.0)
|
||||
|
||||
# [v0.18.0-rc.0](https://github.com/kubeflow/katib/tree/v0.18.0-rc.0) (2025-02-13)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- Upgrade Kubernetes to v1.31.3 ([#2478](https://github.com/kubeflow/katib/pull/2478) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Upgrade Kubernetes to v1.30.7 ([#2463](https://github.com/kubeflow/katib/pull/2463) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Drop Python 3.7 and Support Python 3.11 in the SDK ([#2337](https://github.com/kubeflow/katib/pull/2337) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## New Features
|
||||
|
||||
### Hyperparameter Optimization for LLMs
|
||||
|
||||
- [DOCS] move llm hyperparameter optimisation design image to the proposal directory and rename it ([#2472](https://github.com/kubeflow/katib/pull/2472) by [@mahdikhashan](https://github.com/mahdikhashan))
|
||||
- [GSoC] Update `tune` API for LLM hyperparameters optimization ([#2393](https://github.com/kubeflow/katib/pull/2393) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- [GSoC] Create LLM Hyperparameters Optimization API Proposal ([#2333](https://github.com/kubeflow/katib/pull/2333) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
|
||||
### Support for Advanced Distributions for HPO
|
||||
|
||||
- [GSOC] `optuna` suggestion service logic update ([#2446](https://github.com/kubeflow/katib/pull/2446) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] `hyperopt` suggestion service logic update ([#2412](https://github.com/kubeflow/katib/pull/2412) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] Add validator for feasible space distribution ([#2404](https://github.com/kubeflow/katib/pull/2404) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] added Unknown distribution and convertDistribution in suggestion client ([#2403](https://github.com/kubeflow/katib/pull/2403) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSOC] Support for various Parameter distributions in Katib ([#2334](https://github.com/kubeflow/katib/pull/2334) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- [GSoC] Added `DistributionType` to Experiment API ([#2377](https://github.com/kubeflow/katib/pull/2377) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
|
||||
### Push-based Metrics Collector
|
||||
|
||||
- [GSoC] Provide a PyTorch MNIST Example for Push-based Metrics Collection ([#2437](https://github.com/kubeflow/katib/pull/2437) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] Compatibility Changes in Trial Controller ([#2394](https://github.com/kubeflow/katib/pull/2394) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] New Interface `report_metrics` in Python SDK ([#2371](https://github.com/kubeflow/katib/pull/2371) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] KEP for Project 6: Push-based Metrics Collection for Katib ([#2328](https://github.com/kubeflow/katib/pull/2328) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [GSoC] Add New Parameter in `tune` ([#2369](https://github.com/kubeflow/katib/pull/2369) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
|
||||
### SDK Updates
|
||||
|
||||
- [SDK] Support PyTorchJob as a Trial Worker ([#2512](https://github.com/kubeflow/katib/pull/2512) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] test: Add e2e test for tune function. ([#2399](https://github.com/kubeflow/katib/pull/2399) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [SDK] improve PVC creation name error ([#2496](https://github.com/kubeflow/katib/pull/2496) by [@mahdikhashan](https://github.com/mahdikhashan))
|
||||
- [SDK] Fix empty list for env variables and numpy version ([#2360](https://github.com/kubeflow/katib/pull/2360) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Explain Python version support cycle ([#2354](https://github.com/kubeflow/katib/pull/2354) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- fix(webhook): fix validation message in experiment webhook ([#2507](https://github.com/kubeflow/katib/pull/2507) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Install typing-extensions v4.10.0 to fix Python test error ([#2504](https://github.com/kubeflow/katib/pull/2504) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- [SDK] Update `tune` API ([#2497](https://github.com/kubeflow/katib/pull/2497) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- fix(api): resolve all api voilation exceptions in katib api ([#2482](https://github.com/kubeflow/katib/pull/2482) by [@truc0](https://github.com/truc0))
|
||||
- fix(trial): use propagated gomega to improve debuggability. ([#2432](https://github.com/kubeflow/katib/pull/2432) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- fix(ui): update None Collector with Push Collector. ([#2418](https://github.com/kubeflow/katib/pull/2418) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- fix: Resolve errors in e2e tests for cypress in Katib UI ([#2384](https://github.com/kubeflow/katib/pull/2384) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- doc(example): fix the broken link. ([#2433](https://github.com/kubeflow/katib/pull/2433) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- fix: remove remaining MXNet dependency. ([#2456](https://github.com/kubeflow/katib/pull/2456) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Remove Dropout layer from ENAS Trial container to fix E2E tests ([#2455](https://github.com/kubeflow/katib/pull/2455) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] fix grpc related bugs in Python SDK ([#2398](https://github.com/kubeflow/katib/pull/2398) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- [SDK] Fix types error ([#2424](https://github.com/kubeflow/katib/pull/2424) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- fix: remove the dependency of `protocmp` in `google.golang.org/protobuf/testing/protocmp`. ([#2391](https://github.com/kubeflow/katib/pull/2391) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Fix TestReconcileBatchJob ([#2350](https://github.com/kubeflow/katib/pull/2350) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Fix apple silicon rosetta error when building images from the source code ([#2342](https://github.com/kubeflow/katib/pull/2342) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- fix katib use crds token pipeline trail template guide ([#2330](https://github.com/kubeflow/katib/pull/2330) by [@Jerry-yz](https://github.com/Jerry-yz))
|
||||
- Fix Scikit-Learn Version for Skopt Tests ([#2336](https://github.com/kubeflow/katib/pull/2336) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Misc
|
||||
|
||||
- Set experiment names at a max of 40 characters. ([#2468](https://github.com/kubeflow/katib/pull/2468) by [@AydanPirani](https://github.com/AydanPirani))
|
||||
- [CI] optimize katib ui dockerfile ([#2505](https://github.com/kubeflow/katib/pull/2505) by [@mahdikhashan](https://github.com/mahdikhashan))
|
||||
- Sort experiments by descending creation date by default in katib-ui ([#2498](https://github.com/kubeflow/katib/pull/2498) by [@Doris-xm](https://github.com/Doris-xm))
|
||||
- [GSoC] Add unit tests for `tune` API ([#2423](https://github.com/kubeflow/katib/pull/2423) by [@helenxie-bit](https://github.com/helenxie-bit))
|
||||
- Update MutatingWebhookConfiguration: Switch from objectSelector to AdmissionWebhookMatchConditions ([#2241](https://github.com/kubeflow/katib/pull/2241) by [@lianghao208](https://github.com/lianghao208))
|
||||
- chore: supporting the listen-address parameter on db-manager ([#2465](https://github.com/kubeflow/katib/pull/2465) by [@caiofralmeida](https://github.com/caiofralmeida))
|
||||
- Upgrade klog to v2 ([#2470](https://github.com/kubeflow/katib/pull/2470) by [@Doris-xm](https://github.com/Doris-xm))
|
||||
- Ignore cache exporting errors in the image building workflows ([#2487](https://github.com/kubeflow/katib/pull/2487) by [@Doris-xm](https://github.com/Doris-xm))
|
||||
- Upgrade grpcio version to v1.64.1 ([#2483](https://github.com/kubeflow/katib/pull/2483) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- docs: remove katib workflow ([#2443](https://github.com/kubeflow/katib/pull/2443) by [@gonmmarques](https://github.com/gonmmarques))
|
||||
- Migrate KatibCertGenerator to OPA CertController ([#2345](https://github.com/kubeflow/katib/pull/2345) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Promote @Electronic-Waste and @helenxie-bit as Katib reviewers ([#2439](https://github.com/kubeflow/katib/pull/2439) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update README and out-of-date docs ([#2438](https://github.com/kubeflow/katib/pull/2438) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Changes isort profile to black, to be fully compatible and adds 'pkg' dir for black and flake8 ([#2413](https://github.com/kubeflow/katib/pull/2413) by [@Ygnas](https://github.com/Ygnas))
|
||||
- Introduced error constants and replaced reflect with cmp ([#2289](https://github.com/kubeflow/katib/pull/2289) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- [Test] Refactor `inject_webhook_test.go` according to the Developer Guide ([#2401](https://github.com/kubeflow/katib/pull/2401) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Enhance pre-commit hooks with flake8 and black ([#2407](https://github.com/kubeflow/katib/pull/2407) by [@Ygnas](https://github.com/Ygnas))
|
||||
- added `Distribution` field to feasibleSpace in `api.proto` ([#2397](https://github.com/kubeflow/katib/pull/2397) by [@shashank-iitbhu](https://github.com/shashank-iitbhu))
|
||||
- Begin enabling pre-commit hooks ([#2242](https://github.com/kubeflow/katib/pull/2242) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
- Update Instructions for Argo Workflows ([#2382](https://github.com/kubeflow/katib/pull/2382) by [@jaffe-fly](https://github.com/jaffe-fly))
|
||||
- docs: update suggestion.md ([#2387](https://github.com/kubeflow/katib/pull/2387) by [@eltociear](https://github.com/eltociear))
|
||||
- Add command to re-run GitHub Actions tests ([#2385](https://github.com/kubeflow/katib/pull/2385) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Bump Katib Python SDK to 0.17.0 version ([#2379](https://github.com/kubeflow/katib/pull/2379) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.17.0 ([#2380](https://github.com/kubeflow/katib/pull/2380) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Replaced hpcloud with nxadm for tail package in Go ([#2375](https://github.com/kubeflow/katib/pull/2375) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- Use ErrorList for experiment validator ([#2329](https://github.com/kubeflow/katib/pull/2329) by [@ckcd](https://github.com/ckcd))
|
||||
- Add Changelog for Katib v0.17.0-rc.1 ([#2370](https://github.com/kubeflow/katib/pull/2370) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Remove default caBundle value ([#2368](https://github.com/kubeflow/katib/pull/2368) by [@vihangm](https://github.com/vihangm))
|
||||
- Bump Katib Python SDK to 0.17.0rc1 version ([#2365](https://github.com/kubeflow/katib/pull/2365) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add unit test for `create_experiment` in the `katib_client` module ([#2325](https://github.com/kubeflow/katib/pull/2325) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- Remove code generation from release script ([#2363](https://github.com/kubeflow/katib/pull/2363) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Upgrade the protobuf version to >=4.21.12,<5 ([#2358](https://github.com/kubeflow/katib/pull/2358) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Replace gRPC code generation tool from Znly/protoc to Buf ([#2344](https://github.com/kubeflow/katib/pull/2344) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Replace already closed github.com/golang/mock with go.uber.org/mock ([#2357](https://github.com/kubeflow/katib/pull/2357) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Use cache-dependency-path in actions/setup-go for CI workflow ([#2355](https://github.com/kubeflow/katib/pull/2355) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Update Slack Invitation ([#2349](https://github.com/kubeflow/katib/pull/2349) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update GitHub template to better triage Issues ([#2335](https://github.com/kubeflow/katib/pull/2335) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.17.0-rc.0 ([#2319](https://github.com/kubeflow/katib/pull/2319) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update outdated actions ([#2324](https://github.com/kubeflow/katib/pull/2324) by [@Mersho](https://github.com/Mersho))
|
||||
- Make test fields private in Go unit tests ([#2316](https://github.com/kubeflow/katib/pull/2316) by [@tariq-hasan](https://github.com/tariq-hasan))
|
||||
- Bump Katib Python SDK to 0.17.0rc0 Version ([#2318](https://github.com/kubeflow/katib/pull/2318) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.17.0...v0.18.0-rc.0)
|
||||
|
||||
# [v0.17.0](https://github.com/kubeflow/katib/tree/v0.17.0) (2024-07-12)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- [SDK] Drop Python 3.7 and Support Python 3.11 ([#2337](https://github.com/kubeflow/katib/pull/2337) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [SDK] Upgrade the protobuf version to >=4.21.12,<5 ([#2358](https://github.com/kubeflow/katib/pull/2358) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.26, and support Kubernetes v1.29 ([#2308](https://github.com/kubeflow/katib/pull/2308) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.25, and Support Kubernetes v1.28 ([#2303](https://github.com/kubeflow/katib/pull/2303) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove MXNet examples ([#2267](https://github.com/kubeflow/katib/pull/2267) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## New Features
|
||||
|
||||
### Core Features
|
||||
|
||||
- Replace gRPC code generation tool from Znly/protoc to Buf ([#2344](https://github.com/kubeflow/katib/pull/2344) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Support ARM64 arch for release images ([#2315](https://github.com/kubeflow/katib/pull/2315) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- DB: Add environment variable option to skip DB table creationˆ ([#2245](https://github.com/kubeflow/katib/pull/2245) by [@lkaybob](https://github.com/lkaybob))
|
||||
- Add environment variable option to set postgres ssl mode ([#2266](https://github.com/kubeflow/katib/pull/2266) by [@ckcd](https://github.com/ckcd))
|
||||
- Upgrade TensorFlow version to v2.16.1 ([#2282](https://github.com/kubeflow/katib/pull/2282) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade PyTorch version to v2.2.1 ([#2279](https://github.com/kubeflow/katib/pull/2279) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
### SDK Features
|
||||
|
||||
- [SDK] Generate Name functionality for creating experiments. ([#2272](https://github.com/kubeflow/katib/pull/2272) by [@bharathk005](https://github.com/bharathk005))
|
||||
- [SDK] Add `env` & `env_from` in client tune ([#2235](https://github.com/kubeflow/katib/pull/2235) by [@shipengcheng1230](https://github.com/shipengcheng1230))
|
||||
- [SDK] Add 'algorithm_settings' in client tune ([#2227](https://github.com/kubeflow/katib/pull/2227) by [@shipengcheng1230](https://github.com/shipengcheng1230))
|
||||
- [SDK] Raise more human-readable name conflict exception ([#2199](https://github.com/kubeflow/katib/pull/2199) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- Remove code generation from release script ([#2364](https://github.com/kubeflow/katib/pull/2364) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Fix empty list for env variables and numpy version ([#2360](https://github.com/kubeflow/katib/pull/2360) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Use cache-dependency-path in actions/setup-go for CI workflow ([#2355](https://github.com/kubeflow/katib/pull/2355) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Fix TestReconcileBatchJob ([#2350](https://github.com/kubeflow/katib/pull/2350) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Fix Scikit-Learn Version for Skopt Tests ([#2336](https://github.com/kubeflow/katib/pull/2336) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Fix env per Trial parameter in tune API ([#2304](https://github.com/kubeflow/katib/pull/2304) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Fix: clean up UTs for file metrics collector ([#2285](https://github.com/kubeflow/katib/pull/2285) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Fix tensor devices for DARTS Trial ([#2273](https://github.com/kubeflow/katib/pull/2273) by [@sifa1024](https://github.com/sifa1024))
|
||||
- Typo fix stale.yaml ([#2257](https://github.com/kubeflow/katib/pull/2257) by [@tarilabs](https://github.com/tarilabs))
|
||||
- Fix Optuna Validation for CMA-ES ([#2240](https://github.com/kubeflow/katib/pull/2240) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Misc
|
||||
|
||||
- Replace already closed github.com/golang/mock with go.uber.org/mock ([#2357](https://github.com/kubeflow/katib/pull/2357) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Update outdated actions ([#2324](https://github.com/kubeflow/katib/pull/2324) by [@Mersho](https://github.com/Mersho))
|
||||
- Upgrade Go version to v1.22 ([#2309](https://github.com/kubeflow/katib/pull/2309) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- CI: Enable parallel mode for the coveralls ([#2297](https://github.com/kubeflow/katib/pull/2297) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade Python version to 3.11 ([#2278](https://github.com/kubeflow/katib/pull/2278) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- chore: add unit testcases for files in Text format. ([#2274](https://github.com/kubeflow/katib/pull/2274) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Upgrade google/go-containerregistry/pkg/authn/k8schain ([#2252](https://github.com/kubeflow/katib/pull/2252) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add Technical and style guide to the contribution guide ([#2250](https://github.com/kubeflow/katib/pull/2250) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Install typing-extensions v4.6.3 for Optuna ([#2251](https://github.com/kubeflow/katib/pull/2251) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove legacy BO code ([#2246](https://github.com/kubeflow/katib/pull/2246) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.16.0 ([#2239](https://github.com/kubeflow/katib/pull/2239) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Katib ROADMAP 2022/2023 ([#2153](https://github.com/kubeflow/katib/pull/2153) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update Ubuntu to 22.04 for E2E Tests ([#2222](https://github.com/kubeflow/katib/pull/2222) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Run Stale Action Every 5th Hour ([#2221](https://github.com/kubeflow/katib/pull/2221) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Stale GitHub Action ([#2220](https://github.com/kubeflow/katib/pull/2220) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.16.0-rc.1 ([#2218](https://github.com/kubeflow/katib/pull/2218) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.16.0-rc.0 ([#2204](https://github.com/kubeflow/katib/pull/2204) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Use the controller-runtime logger in the cert-generator ([#2219](https://github.com/kubeflow/katib/pull/2219) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.16.0...v0.17.0)
|
||||
|
||||
# [v0.17.0-rc.1](https://github.com/kubeflow/katib/tree/v0.17.0-rc.1) (2024-06-20)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- [SDK] Drop Python 3.7 and Support Python 3.11 ([#2337](https://github.com/kubeflow/katib/pull/2337) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [SDK] Upgrade the protobuf version to >=4.21.12,<5 ([#2358](https://github.com/kubeflow/katib/pull/2358) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## New Features
|
||||
|
||||
- Replace gRPC code generation tool from Znly/protoc to Buf ([#2344](https://github.com/kubeflow/katib/pull/2344) by [@forsaken628](https://github.com/forsaken628))
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- Remove code generation from release script ([#2364](https://github.com/kubeflow/katib/pull/2364) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Fix empty list for env variables and numpy version ([#2360](https://github.com/kubeflow/katib/pull/2360) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Use cache-dependency-path in actions/setup-go for CI workflow ([#2355](https://github.com/kubeflow/katib/pull/2355) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Fix TestReconcileBatchJob ([#2350](https://github.com/kubeflow/katib/pull/2350) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Fix Scikit-Learn Version for Skopt Tests ([#2336](https://github.com/kubeflow/katib/pull/2336) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Misc
|
||||
|
||||
- Replace already closed github.com/golang/mock with go.uber.org/mock ([#2357](https://github.com/kubeflow/katib/pull/2357) by [@forsaken628](https://github.com/forsaken628))
|
||||
- Update outdated actions ([#2324](https://github.com/kubeflow/katib/pull/2324) by [@Mersho](https://github.com/Mersho))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.17.0-rc.0...v0.17.0-rc.1)
|
||||
|
||||
# [v0.17.0-rc.0](https://github.com/kubeflow/katib/tree/v0.17.0-rc.0) (2024-04-29)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- Drop Kubernetes v1.26, and support Kubernetes v1.29 ([#2308](https://github.com/kubeflow/katib/pull/2308) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.25, and Support Kubernetes v1.28 ([#2303](https://github.com/kubeflow/katib/pull/2303) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## New Features
|
||||
|
||||
### Core Features
|
||||
|
||||
- Support ARM64 arch for release images ([#2315](https://github.com/kubeflow/katib/pull/2315) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- DB: Add environment variable option to skip DB table creationˆ ([#2245](https://github.com/kubeflow/katib/pull/2245) by [@lkaybob](https://github.com/lkaybob))
|
||||
- Add environment variable option to set postgres ssl mode ([#2266](https://github.com/kubeflow/katib/pull/2266) by [@ckcd](https://github.com/ckcd))
|
||||
- Upgrade TensorFlow version to v2.16.1 ([#2282](https://github.com/kubeflow/katib/pull/2282) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade PyTorch version to v2.2.1 ([#2279](https://github.com/kubeflow/katib/pull/2279) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
### SDK Features
|
||||
|
||||
- [SDK] Generate Name functionality for creating experiments. ([#2272](https://github.com/kubeflow/katib/pull/2272) by [@bharathk005](https://github.com/bharathk005))
|
||||
- [SDK] Add `env` & `env_from` in client tune ([#2235](https://github.com/kubeflow/katib/pull/2235) by [@shipengcheng1230](https://github.com/shipengcheng1230))
|
||||
- [SDK] Add 'algorithm_settings' in client tune ([#2227](https://github.com/kubeflow/katib/pull/2227) by [@shipengcheng1230](https://github.com/shipengcheng1230))
|
||||
- [SDK] Raise more human-readable name conflict exception ([#2199](https://github.com/kubeflow/katib/pull/2199) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- [SDK] Fix env per Trial parameter in tune API ([#2304](https://github.com/kubeflow/katib/pull/2304) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Fix: clean up UTs for file metrics collector ([#2285](https://github.com/kubeflow/katib/pull/2285) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Fix tensor devices for DARTS Trial ([#2273](https://github.com/kubeflow/katib/pull/2273) by [@sifa1024](https://github.com/sifa1024))
|
||||
- Typo fix stale.yaml ([#2257](https://github.com/kubeflow/katib/pull/2257) by [@tarilabs](https://github.com/tarilabs))
|
||||
- Fix Optuna Validation for CMA-ES ([#2240](https://github.com/kubeflow/katib/pull/2240) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Misc
|
||||
|
||||
- Upgrade Go version to v1.22 ([#2309](https://github.com/kubeflow/katib/pull/2309) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- CI: Enable parallel mode for the coveralls ([#2297](https://github.com/kubeflow/katib/pull/2297) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade Python version to 3.11 ([#2278](https://github.com/kubeflow/katib/pull/2278) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- chore: add unit testcases for files in Text format. ([#2274](https://github.com/kubeflow/katib/pull/2274) by [@Electronic-Waste](https://github.com/Electronic-Waste))
|
||||
- Upgrade google/go-containerregistry/pkg/authn/k8schain ([#2252](https://github.com/kubeflow/katib/pull/2252) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove MXNet examples ([#2267](https://github.com/kubeflow/katib/pull/2267) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add Technical and style guide to the contribution guide ([#2250](https://github.com/kubeflow/katib/pull/2250) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Install typing-extensions v4.6.3 for Optuna ([#2251](https://github.com/kubeflow/katib/pull/2251) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove legacy BO code ([#2246](https://github.com/kubeflow/katib/pull/2246) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.16.0 ([#2239](https://github.com/kubeflow/katib/pull/2239) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Katib ROADMAP 2022/2023 ([#2153](https://github.com/kubeflow/katib/pull/2153) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update Ubuntu to 22.04 for E2E Tests ([#2222](https://github.com/kubeflow/katib/pull/2222) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Run Stale Action Every 5th Hour ([#2221](https://github.com/kubeflow/katib/pull/2221) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Stale GitHub Action ([#2220](https://github.com/kubeflow/katib/pull/2220) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.16.0-rc.1 ([#2218](https://github.com/kubeflow/katib/pull/2218) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.16.0-rc.0 ([#2204](https://github.com/kubeflow/katib/pull/2204) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Use the controller-runtime logger in the cert-generator ([#2219](https://github.com/kubeflow/katib/pull/2219) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.16.0...v0.17.0-rc.0)
|
||||
|
||||
# [v0.16.0](https://github.com/kubeflow/katib/tree/v0.16.0) (2023-10-31)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- Implement KatibConfig API ([#2176](https://github.com/kubeflow/katib/pull/2176) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.24 and support Kubernetes v1.27 ([#2182](https://github.com/kubeflow/katib/pull/2182) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.23 and support Kubernetes v1.26 ([#2177](https://github.com/kubeflow/katib/pull/2177) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Change failurePolicy to Fail for Katib Webhooks ([#2018](https://github.com/kubeflow/katib/pull/2018) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## New Features
|
||||
|
||||
### Core Features
|
||||
|
||||
- Consolidate the Katib Cert Generator to the Katib Controller ([#2185](https://github.com/kubeflow/katib/pull/2185) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Containerize tests for Katib Conformance ([#2146](https://github.com/kubeflow/katib/pull/2146) by [@nagar-ajay](https://github.com/nagar-ajay))
|
||||
|
||||
### UI Improvements
|
||||
|
||||
- [UI] Default Resume Policy to never from UI ([#2195](https://github.com/kubeflow/katib/pull/2195) by [@mChowdhury-91](https://github.com/mChowdhury-91))
|
||||
- [UI] Remove Deprecated Katib UI ([#2179](https://github.com/kubeflow/katib/pull/2179) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [UI] Fix Trial Logs when Kubernetes Job Fails ([#2164](https://github.com/kubeflow/katib/pull/2164) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- kwa(front): Support all namespaces ([#2119](https://github.com/kubeflow/katib/pull/2119) by [@elenzio9](https://github.com/elenzio9))
|
||||
- kwa(front): Update the use of SnackBarService ([#2113](https://github.com/kubeflow/katib/pull/2113) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- UI: Remove an unsed import, EventV1beta1Api ([#2116](https://github.com/kubeflow/katib/pull/2116) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
### SDK Improvements
|
||||
|
||||
- [SDK] Enable resource specification for trial containers ([#2192](https://github.com/kubeflow/katib/pull/2192) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
- [SDK] Add namespace parameter to KatibClient ([#2183](https://github.com/kubeflow/katib/pull/2183) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
- [SDK] Import all Kubernetes Models ([#2148](https://github.com/kubeflow/katib/pull/2148) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Bug fixes
|
||||
|
||||
- Bug: Wait for the certs to be mounted inside the container ([#2213](https://github.com/kubeflow/katib/pull/2213) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Start waiting for certs to be ready before sending data to the channel ([#2215](https://github.com/kubeflow/katib/pull/2215) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- E2E: Add additional checks to verify if the components are ready ([#2212](https://github.com/kubeflow/katib/pull/2212) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove a katib-webhook-cert Secret from components ([#2214](https://github.com/kubeflow/katib/pull/2214) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Skip to inject the metrics-collector pods to the Katib controller ([#2211](https://github.com/kubeflow/katib/pull/2211) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Sending an empty data to the certsReady channel ([#2196](https://github.com/kubeflow/katib/pull/2196) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Fix conformance docker image ([#2147](https://github.com/kubeflow/katib/pull/2147) by [@nagar-ajay](https://github.com/nagar-ajay))
|
||||
|
||||
## Documentation
|
||||
|
||||
- Add PITS Global Data Recovery Services to the adopters list ([#2160](https://github.com/kubeflow/katib/pull/2160) by [@ghost](https://github.com/ghost))
|
||||
- Add SDK Breaking Change to Changelog ([#2133](https://github.com/kubeflow/katib/pull/2133) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.15.0 ([#2129](https://github.com/kubeflow/katib/pull/2129) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.15.0-rc.1 ([#2123](https://github.com/kubeflow/katib/pull/2123) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.15.0-rc.0 ([#2106](https://github.com/kubeflow/katib/pull/2106) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Misc
|
||||
|
||||
- Upgrade Tensorflow version to v2.13.0 ([#2216](https://github.com/kubeflow/katib/pull/2216) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade Go version to v1.20 ([#2190](https://github.com/kubeflow/katib/pull/2190) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Replace grpc_health_probe with the built-in gRPC container probe feature ([#2189](https://github.com/kubeflow/katib/pull/2189) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Allow install binaries for the arm64 in the envtest ([#2188](https://github.com/kubeflow/katib/pull/2188) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Replace action to setup minikube with medyagh/setup-minikube ([#2178](https://github.com/kubeflow/katib/pull/2178) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove Charmed Operators for Katib ([#2161](https://github.com/kubeflow/katib/pull/2161) by [@ca-scribner](https://github.com/ca-scribner))
|
||||
- Namespace and trial pod annotations as CLI argument ([#2138](https://github.com/kubeflow/katib/pull/2138) by [@nagar-ajay](https://github.com/nagar-ajay))
|
||||
- Relax dependencies restriction for the gRPC libraries ([#2140](https://github.com/kubeflow/katib/pull/2140) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add SDK Breaking Change to Changelog ([#2133](https://github.com/kubeflow/katib/pull/2133) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Increase the free spaces in CI ([#2131](https://github.com/kubeflow/katib/pull/2131) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Reformat katib-operators ([#2114](https://github.com/kubeflow/katib/pull/2114) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.15.0...v0.16.0)
|
||||
|
||||
# [v0.16.0-rc.1](https://github.com/kubeflow/katib/tree/v0.16.0-rc.1) (2023-08-16)
|
||||
|
||||
## New Features
|
||||
|
||||
- Upgrade Tensorflow version to v2.13.0 ([#2216](https://github.com/kubeflow/katib/pull/2216) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- Bug: Wait for the certs to be mounted inside the container ([#2213](https://github.com/kubeflow/katib/pull/2213) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Start waiting for certs to be ready before sending data to the channel ([#2215](https://github.com/kubeflow/katib/pull/2215) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- E2E: Add additional checks to verify if the components are ready ([#2212](https://github.com/kubeflow/katib/pull/2212) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove a katib-webhook-cert Secret from components ([#2214](https://github.com/kubeflow/katib/pull/2214) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Skip to inject the metrics-collector pods to the Katib controller ([#2211](https://github.com/kubeflow/katib/pull/2211) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.16.0-rc.0...v0.16.0-rc.1)
|
||||
|
||||
# [v0.16.0-rc.0](https://github.com/kubeflow/katib/tree/v0.16.0-rc.0) (2023-08-05)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- Implement KatibConfig API ([#2176](https://github.com/kubeflow/katib/pull/2176) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.24 and support Kubernetes v1.27 ([#2182](https://github.com/kubeflow/katib/pull/2182) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.23 and support Kubernetes v1.26 ([#2177](https://github.com/kubeflow/katib/pull/2177) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Change failurePolicy to Fail for Katib Webhooks ([#2018](https://github.com/kubeflow/katib/pull/2018) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## New Features
|
||||
|
||||
### Core Features
|
||||
|
||||
- Consolidate the Katib Cert Generator to the Katib Controller ([#2185](https://github.com/kubeflow/katib/pull/2185) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Containerize tests for Katib Conformance ([#2146](https://github.com/kubeflow/katib/pull/2146) by [@nagar-ajay](https://github.com/nagar-ajay))
|
||||
|
||||
### UI Improvements
|
||||
|
||||
- [UI] Default Resume Policy to never from UI ([#2195](https://github.com/kubeflow/katib/pull/2195) by [@mChowdhury-91](https://github.com/mChowdhury-91))
|
||||
- [UI] Remove Deprecated Katib UI ([#2179](https://github.com/kubeflow/katib/pull/2179) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [UI] Fix Trial Logs when Kubernetes Job Fails ([#2164](https://github.com/kubeflow/katib/pull/2164) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- kwa(front): Support all namespaces ([#2119](https://github.com/kubeflow/katib/pull/2119) by [@elenzio9](https://github.com/elenzio9))
|
||||
- kwa(front): Update the use of SnackBarService ([#2113](https://github.com/kubeflow/katib/pull/2113) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- UI: Remove an unsed import, EventV1beta1Api ([#2116](https://github.com/kubeflow/katib/pull/2116) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
### SDK Improvements
|
||||
|
||||
- [SDK] Enable resource specification for trial containers ([#2192](https://github.com/kubeflow/katib/pull/2192) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
- [SDK] Add namespace parameter to KatibClient ([#2183](https://github.com/kubeflow/katib/pull/2183) by [@droctothorpe](https://github.com/droctothorpe))
|
||||
- [SDK] Import all Kubernetes Models ([#2148](https://github.com/kubeflow/katib/pull/2148) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Bug fixes
|
||||
|
||||
- Sending an empty data to the certsReady channel ([#2196](https://github.com/kubeflow/katib/pull/2196) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Fix conformance docker image ([#2147](https://github.com/kubeflow/katib/pull/2147) by [@nagar-ajay](https://github.com/nagar-ajay))
|
||||
|
||||
## Documentation
|
||||
|
||||
- Add PITS Global Data Recovery Services to the adopters list ([#2160](https://github.com/kubeflow/katib/pull/2160) by [@ghost](https://github.com/ghost))
|
||||
- Add SDK Breaking Change to Changelog ([#2133](https://github.com/kubeflow/katib/pull/2133) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.15.0 ([#2129](https://github.com/kubeflow/katib/pull/2129) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.15.0-rc.1 ([#2123](https://github.com/kubeflow/katib/pull/2123) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add Changelog for Katib v0.15.0-rc.0 ([#2106](https://github.com/kubeflow/katib/pull/2106) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Misc
|
||||
|
||||
- Upgrade Go version to v1.20 ([#2190](https://github.com/kubeflow/katib/pull/2190) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Replace grpc_health_probe with the built-in gRPC container probe feature ([#2189](https://github.com/kubeflow/katib/pull/2189) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Allow install binaries for the arm64 in the envtest ([#2188](https://github.com/kubeflow/katib/pull/2188) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Replace action to setup minikube with medyagh/setup-minikube ([#2178](https://github.com/kubeflow/katib/pull/2178) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove Charmed Operators for Katib ([#2161](https://github.com/kubeflow/katib/pull/2161) by [@ca-scribner](https://github.com/ca-scribner))
|
||||
- Namespace and trial pod annotations as CLI argument ([#2138](https://github.com/kubeflow/katib/pull/2138) by [@nagar-ajay](https://github.com/nagar-ajay))
|
||||
- Relax dependencies restriction for the gRPC libraries ([#2140](https://github.com/kubeflow/katib/pull/2140) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add SDK Breaking Change to Changelog ([#2133](https://github.com/kubeflow/katib/pull/2133) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Increase the free spaces in CI ([#2131](https://github.com/kubeflow/katib/pull/2131) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Reformat katib-operators ([#2114](https://github.com/kubeflow/katib/pull/2114) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.15.0...v0.16.0-rc.0)
|
||||
|
||||
# [v0.15.0](https://github.com/kubeflow/katib/tree/v0.15.0) (2023-03-22)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- Use **Never** Resume Policy as Default ([#2102](https://github.com/kubeflow/katib/pull/2102) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Chocolate Suggestion Service is removed ([#2071](https://github.com/kubeflow/katib/pull/2071) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- `request_number` is removed from the GRPC APIs ([#1994](https://github.com/kubeflow/katib/pull/1994) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Enabling Authorization in Katib UI ([#1983](https://github.com/kubeflow/katib/pull/1983) and [#2041](https://github.com/kubeflow/katib/pull/2041) by [@apo-ger](https://github.com/apo-ger))
|
||||
- The new improved and refactored Katib SDK is not backward compatible ([#2075](https://github.com/kubeflow/katib/pull/2075) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## New Features
|
||||
|
||||
### Major Features
|
||||
|
||||
- Narrow down Katib RBAC rules ([#2091](https://github.com/kubeflow/katib/pull/2091) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Support Postgres as a Katib DB ([#1921](https://github.com/kubeflow/katib/pull/1921) by [@anencore94](https://github.com/anencore94))
|
||||
- More Suggestion container fields in Katib Config ([#2000](https://github.com/kubeflow/katib/pull/2000) by [@fischor](https://github.com/fischor))
|
||||
- Katib UI: Create the LOGS tab of Trial's details page ([#2117](https://github.com/kubeflow/katib/pull/2117) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Katib UI: Enable pagination/sorting/filtering ([#2017](https://github.com/kubeflow/katib/pull/2017) and [#2040](https://github.com/kubeflow/katib/pull/2040) by [@elenzio9](https://github.com/elenzio9))
|
||||
- [SDK] Create Tune API in the Katib SDK ([#1951](https://github.com/kubeflow/katib/pull/1951) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Get Trial Metrics from Katib DB ([#2050](https://github.com/kubeflow/katib/pull/2050) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
### Core Features
|
||||
|
||||
- Add Conformance Program Doc for AutoML and Training WG ([#2048](https://github.com/kubeflow/katib/pull/2048) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Support for grid search algorithm in Optuna Suggestion Service ([#2060](https://github.com/kubeflow/katib/pull/2060) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add Trial Labels During Pod Mutation ([#2047](https://github.com/kubeflow/katib/pull/2047) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Support for k8s v1.25 in CI ([#1997](https://github.com/kubeflow/katib/pull/1997) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Add the CI to build multi-platform container images ([#1956](https://github.com/kubeflow/katib/pull/1956) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.21 and introduce Kubernetes v1.24 ([#1953](https://github.com/kubeflow/katib/pull/1953) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add --connect-timeout flag to katib-db-manager ([#1937](https://github.com/kubeflow/katib/pull/1937) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Implement validations for DARTS suggestion service ([#1926](https://github.com/kubeflow/katib/pull/1926) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Implement validation for Optuna suggestion service ([#1924](https://github.com/kubeflow/katib/pull/1924) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
### UI Improvements
|
||||
|
||||
- Make links in KWA's tables actual links ([#2090](https://github.com/kubeflow/katib/pull/2090) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Rework the trial graph using ECharts in KWA ([#2089](https://github.com/kubeflow/katib/pull/2089) by [@elenzio9](https://github.com/elenzio9))
|
||||
- kwa(front): Add UI tests with Cypress ([#2088](https://github.com/kubeflow/katib/pull/2088) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- frontend: Enable actions in experiment graph ([#2065](https://github.com/kubeflow/katib/pull/2065) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Show message in case of uncompleted trial instead of the graph ([#2063](https://github.com/kubeflow/katib/pull/2063) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Add source maps in the browser ([#2043](https://github.com/kubeflow/katib/pull/2043) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Backend for getting logs of a trial ([#2039](https://github.com/kubeflow/katib/pull/2039) by [@d-gol](https://github.com/d-gol))
|
||||
- frontend: Show the successful trials in the experiment graph (#2013) ([#2033](https://github.com/kubeflow/katib/pull/2033) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Migrate from tslint to eslint in KWA ([#2042](https://github.com/kubeflow/katib/pull/2042) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Dedicated yaml tab for Trials ([#2034](https://github.com/kubeflow/katib/pull/2034) by [@elenzio9](https://github.com/elenzio9))
|
||||
- KWA: Use new Editor component (Monaco) ([#2023](https://github.com/kubeflow/katib/pull/2023) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- kwa(build): Introduce COMMIT file for building KWA ([#2014](https://github.com/kubeflow/katib/pull/2014) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- frontend: Fix 500 error after detail page refresh (#1967) ([#2001](https://github.com/kubeflow/katib/pull/2001) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Introduce KWA's frontend component for kfp links ([#1991](https://github.com/kubeflow/katib/pull/1991) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Rename and right align the age column ([#1989](https://github.com/kubeflow/katib/pull/1989) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Show the trials table's status column first ([#1990](https://github.com/kubeflow/katib/pull/1990) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Make KWA's main table responsive and add toolbar ([#1982](https://github.com/kubeflow/katib/pull/1982) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Fix unit tests ([#1977](https://github.com/kubeflow/katib/pull/1977) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Format code ([#1979](https://github.com/kubeflow/katib/pull/1979) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- Recreate the Experiments Parallel Coordinates Graph ([#1974](https://github.com/kubeflow/katib/pull/1974) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Improve UI API/controller logging to ease troubleshooting ([#1966](https://github.com/kubeflow/katib/pull/1966) by [@lukeogg](https://github.com/lukeogg))
|
||||
|
||||
### SDK Improvements
|
||||
|
||||
- [SDK] Use Katib SDK for E2E Tests ([#2075](https://github.com/kubeflow/katib/pull/2075) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Use Katib Client without Kube Config ([#2098](https://github.com/kubeflow/katib/pull/2098) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Fix namespace parameter in tune API ([#1981](https://github.com/kubeflow/katib/pull/1981) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Remove Final Keyword from constants ([#1980](https://github.com/kubeflow/katib/pull/1980) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Bug fixes
|
||||
|
||||
- Fix Release Script for Updating SDK Version ([#2104](https://github.com/kubeflow/katib/pull/2104) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [Fix] add early stopped trials in converter ([#2004](https://github.com/kubeflow/katib/pull/2004) by [@shaowei-su](https://github.com/shaowei-su))
|
||||
- [bugfix] Fix value passing bug in New Experiment form ([#2027](https://github.com/kubeflow/katib/pull/2027) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- Fix main process retrieve logic for early stopping ([#1988](https://github.com/kubeflow/katib/pull/1988) by [@shaowei-su](https://github.com/shaowei-su))
|
||||
- [hotfix]: filter by name of experiment ([#1920](https://github.com/kubeflow/katib/pull/1920) by [@anencore94](https://github.com/anencore94))
|
||||
- Fix push script to include new images ([#1911](https://github.com/kubeflow/katib/pull/1911) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- fix: only validate Kubernetes Job ([#2025](https://github.com/kubeflow/katib/pull/2025) by [@zhixian82](https://github.com/zhixian82))
|
||||
- Upgrade grpc-health-probe version to fix some security issues ([#2093](https://github.com/kubeflow/katib/pull/2093) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Format Katib Charm Operator ([#2115](https://github.com/kubeflow/katib/pull/2115) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## Documentation
|
||||
|
||||
- Add CERN to adopters ([#2010](https://github.com/kubeflow/katib/pull/2010) by [@d-gol](https://github.com/d-gol))
|
||||
- Add More Katib Presentations 2022 ([#2009](https://github.com/kubeflow/katib/pull/2009) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add the documentation for simple-pbt ([#1978](https://github.com/kubeflow/katib/pull/1978) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add the license to pbt ([#1958](https://github.com/kubeflow/katib/pull/1958) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Update the Katib version in docs ([#1950](https://github.com/kubeflow/katib/pull/1950) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Update CHANGELOG for v0.14.0 release ([#1932](https://github.com/kubeflow/katib/pull/1932) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
|
||||
## Misc
|
||||
|
||||
- Update Training operator Image in CI ([#2103](https://github.com/kubeflow/katib/pull/2103) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Upgrade Go libraries to resolve security issues ([#2094](https://github.com/kubeflow/katib/pull/2094) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Run e2e with various Python versions to verify Python SDK ([#2092](https://github.com/kubeflow/katib/pull/2092) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add a --prefer-binary flag to 'pip install' command ([#2096](https://github.com/kubeflow/katib/pull/2096) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade PyTorch version to v1.13.0 ([#2082](https://github.com/kubeflow/katib/pull/2082) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade Tensorflow version ([#2079](https://github.com/kubeflow/katib/pull/2079) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade Python version to 3.10 ([#2057](https://github.com/kubeflow/katib/pull/2057) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Pin the NumPy version with v1.23.5 in some images ([#2070](https://github.com/kubeflow/katib/pull/2070) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade the actions-setup-minikube version to v2.7.2 ([#2064](https://github.com/kubeflow/katib/pull/2064) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove Certificate Chain from Cert Generator ([#2045](https://github.com/kubeflow/katib/pull/2045) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add resources to earlystopping container ([#2038](https://github.com/kubeflow/katib/pull/2038) by [@zhixian82](https://github.com/zhixian82))
|
||||
- Add scripts to verify generated codes and Go Modules ([#1999](https://github.com/kubeflow/katib/pull/1999) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [Test] Reduce Katib GitHub Action Runs ([#2036](https://github.com/kubeflow/katib/pull/2036) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- gh-actions: Extend action to run Frontend Unit tests ([#1998](https://github.com/kubeflow/katib/pull/1998) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- [chore] Upgrade docker/metadata-action, actions/checkout, and actions/setup-python version ([#1996](https://github.com/kubeflow/katib/pull/1996) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [chore] Upgrade Go version to v1.19 ([#1995](https://github.com/kubeflow/katib/pull/1995) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in simple-pbt image ([#1948](https://github.com/kubeflow/katib/pull/1948) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support arm64 in darts-cnn-cifar10 image ([#1947](https://github.com/kubeflow/katib/pull/1947) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in enas-cnn-cifar10 image ([#1944](https://github.com/kubeflow/katib/pull/1944) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in pytorch-mnist image ([#1943](https://github.com/kubeflow/katib/pull/1943) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in mxnet-mnist image ([#1940](https://github.com/kubeflow/katib/pull/1940) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Use the katib-new-ui for Charmed gh-actions ([#1987](https://github.com/kubeflow/katib/pull/1987) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [feat] health check for katib-controller ([#1934](https://github.com/kubeflow/katib/pull/1934) by [@anencore94](https://github.com/anencore94))
|
||||
- Upgrade Optuna from v2.x.x to v3.0.0 ([#1942](https://github.com/kubeflow/katib/pull/1942) by [@keisuke-umezawa](https://github.com/keisuke-umezawa))
|
||||
- Add validation webhooks for maxFailedTrialCount and parallelTrialCount ([#1936](https://github.com/kubeflow/katib/pull/1936) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Introduce Automatic platform ARGs ([#1935](https://github.com/kubeflow/katib/pull/1935) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Update training operator image in CI ([#1933](https://github.com/kubeflow/katib/pull/1933) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Update Katib SDK version ([#1931](https://github.com/kubeflow/katib/pull/1931) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- [chore] Upgrade Go version to v1.18 ([#1925](https://github.com/kubeflow/katib/pull/1925) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add the pytorch-mnist with GPU support container image ([#1916](https://github.com/kubeflow/katib/pull/1916) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.14.0...v0.15.0)
|
||||
|
||||
# [v0.15.0-rc.1](https://github.com/kubeflow/katib/tree/v0.15.0-rc.1) (2023-02-15)
|
||||
|
||||
## New Features
|
||||
|
||||
- UI: Create the LOGS tab of Trial's details page ([#2117](https://github.com/kubeflow/katib/pull/2117) by [@elenzio9](https://github.com/elenzio9))
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- Format Katib Charm Operator ([#2115](https://github.com/kubeflow/katib/pull/2115) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.15.0-rc.0...v0.15.0-rc.1)
|
||||
|
||||
# [v0.15.0-rc.0](https://github.com/kubeflow/katib/tree/v0.15.0-rc.0) (2023-01-27)
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- Use **Never** Resume Policy as Default ([#2102](https://github.com/kubeflow/katib/pull/2102) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Chocolate Suggestion Service is removed ([#2071](https://github.com/kubeflow/katib/pull/2071) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- `request_number` is removed from the GRPC APIs ([#1994](https://github.com/kubeflow/katib/pull/1994) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- The new improved and refactored Katib SDK is not backward compatible ([#2075](https://github.com/kubeflow/katib/pull/2075) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## New Features
|
||||
|
||||
### Major Features
|
||||
|
||||
- Narrow down Katib RBAC rules ([#2091](https://github.com/kubeflow/katib/pull/2091) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Support Postgres as a Katib DB ([#1921](https://github.com/kubeflow/katib/pull/1921) by [@anencore94](https://github.com/anencore94))
|
||||
- More Suggestion container fields in Katib Config ([#2000](https://github.com/kubeflow/katib/pull/2000) by [@fischor](https://github.com/fischor))
|
||||
- Katib UI: Enable pagination/sorting/filtering ([#2017](https://github.com/kubeflow/katib/pull/2017) and [#2040](https://github.com/kubeflow/katib/pull/2040) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Katib UI: Add authorization mechanisms ([#1983](https://github.com/kubeflow/katib/pull/1983) by [@apo-ger](https://github.com/apo-ger))
|
||||
- [SDK] Create Tune API in the Katib SDK ([#1951](https://github.com/kubeflow/katib/pull/1951) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Get Trial Metrics from Katib DB ([#2050](https://github.com/kubeflow/katib/pull/2050) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
### Core Features
|
||||
|
||||
- Add Conformance Program Doc for AutoML and Training WG ([#2048](https://github.com/kubeflow/katib/pull/2048) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Support for grid search algorithm in Optuna Suggestion Service ([#2060](https://github.com/kubeflow/katib/pull/2060) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add Trial Labels During Pod Mutation ([#2047](https://github.com/kubeflow/katib/pull/2047) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Support for k8s v1.25 in CI ([#1997](https://github.com/kubeflow/katib/pull/1997) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Add the CI to build multi-platform container images ([#1956](https://github.com/kubeflow/katib/pull/1956) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Drop Kubernetes v1.21 and introduce Kubernetes v1.24 ([#1953](https://github.com/kubeflow/katib/pull/1953) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add --connect-timeout flag to katib-db-manager ([#1937](https://github.com/kubeflow/katib/pull/1937) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Implement validations for DARTS suggestion service ([#1926](https://github.com/kubeflow/katib/pull/1926) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Implement validation for Optuna suggestion service ([#1924](https://github.com/kubeflow/katib/pull/1924) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
### UI Improvements
|
||||
|
||||
- Make links in KWA's tables actual links ([#2090](https://github.com/kubeflow/katib/pull/2090) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Rework the trial graph using ECharts in KWA ([#2089](https://github.com/kubeflow/katib/pull/2089) by [@elenzio9](https://github.com/elenzio9))
|
||||
- kwa(front): Add UI tests with Cypress ([#2088](https://github.com/kubeflow/katib/pull/2088) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- Update manifests to enable authorization check mechanisms for Katib UI in Kubeflow mode ([#2041](https://github.com/kubeflow/katib/pull/2041) by [@apo-ger](https://github.com/apo-ger))
|
||||
- frontend: Enable actions in experiment graph ([#2065](https://github.com/kubeflow/katib/pull/2065) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Show message in case of uncompleted trial instead of the graph ([#2063](https://github.com/kubeflow/katib/pull/2063) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Add source maps in the browser ([#2043](https://github.com/kubeflow/katib/pull/2043) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Backend for getting logs of a trial ([#2039](https://github.com/kubeflow/katib/pull/2039) by [@d-gol](https://github.com/d-gol))
|
||||
- frontend: Show the successful trials in the experiment graph (#2013) ([#2033](https://github.com/kubeflow/katib/pull/2033) by [@elenzio9](https://github.com/elenzio9))
|
||||
- frontend: Migrate from tslint to eslint in KWA ([#2042](https://github.com/kubeflow/katib/pull/2042) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Dedicated yaml tab for Trials ([#2034](https://github.com/kubeflow/katib/pull/2034) by [@elenzio9](https://github.com/elenzio9))
|
||||
- KWA: Use new Editor component (Monaco) ([#2023](https://github.com/kubeflow/katib/pull/2023) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- kwa(build): Introduce COMMIT file for building KWA ([#2014](https://github.com/kubeflow/katib/pull/2014) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- frontend: Fix 500 error after detail page refresh (#1967) ([#2001](https://github.com/kubeflow/katib/pull/2001) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Introduce KWA's frontend component for kfp links ([#1991](https://github.com/kubeflow/katib/pull/1991) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Rename and right align the age column ([#1989](https://github.com/kubeflow/katib/pull/1989) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Show the trials table's status column first ([#1990](https://github.com/kubeflow/katib/pull/1990) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Make KWA's main table responsive and add toolbar ([#1982](https://github.com/kubeflow/katib/pull/1982) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Fix unit tests ([#1977](https://github.com/kubeflow/katib/pull/1977) by [@elenzio9](https://github.com/elenzio9))
|
||||
- UI: Format code ([#1979](https://github.com/kubeflow/katib/pull/1979) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- Recreate the Experiments Parallel Coordinates Graph ([#1974](https://github.com/kubeflow/katib/pull/1974) by [@elenzio9](https://github.com/elenzio9))
|
||||
- Improve UI API/controller logging to ease troubleshooting ([#1966](https://github.com/kubeflow/katib/pull/1966) by [@lukeogg](https://github.com/lukeogg))
|
||||
|
||||
### SDK Improvements
|
||||
|
||||
- [SDK] Use Katib SDK for E2E Tests ([#2075](https://github.com/kubeflow/katib/pull/2075) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Use Katib Client without Kube Config ([#2098](https://github.com/kubeflow/katib/pull/2098) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Fix namespace parameter in tune API ([#1981](https://github.com/kubeflow/katib/pull/1981) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [SDK] Remove Final Keyword from constants ([#1980](https://github.com/kubeflow/katib/pull/1980) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
||||
## Bug fixes
|
||||
|
||||
- Fix Release Script for Updating SDK Version ([#2104](https://github.com/kubeflow/katib/pull/2104) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- [Fix] add early stopped trials in converter ([#2004](https://github.com/kubeflow/katib/pull/2004) by [@shaowei-su](https://github.com/shaowei-su))
|
||||
- [bugfix] Fix value passing bug in New Experiment form ([#2027](https://github.com/kubeflow/katib/pull/2027) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- Fix main process retrieve logic for early stopping ([#1988](https://github.com/kubeflow/katib/pull/1988) by [@shaowei-su](https://github.com/shaowei-su))
|
||||
- [hotfix]: filter by name of experiment ([#1920](https://github.com/kubeflow/katib/pull/1920) by [@anencore94](https://github.com/anencore94))
|
||||
- Fix push script to include new images ([#1911](https://github.com/kubeflow/katib/pull/1911) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- fix: only validate Kubernetes Job ([#2025](https://github.com/kubeflow/katib/pull/2025) by [@zhixian82](https://github.com/zhixian82))
|
||||
- Upgrade grpc-health-probe version to fix some security issues ([#2093](https://github.com/kubeflow/katib/pull/2093) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## Documentation
|
||||
|
||||
- Add CERN to adopters ([#2010](https://github.com/kubeflow/katib/pull/2010) by [@d-gol](https://github.com/d-gol))
|
||||
- Add More Katib Presentations 2022 ([#2009](https://github.com/kubeflow/katib/pull/2009) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add the documentation for simple-pbt ([#1978](https://github.com/kubeflow/katib/pull/1978) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add the license to pbt ([#1958](https://github.com/kubeflow/katib/pull/1958) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Update the Katib version in docs ([#1950](https://github.com/kubeflow/katib/pull/1950) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Update CHANGELOG for v0.14.0 release ([#1932](https://github.com/kubeflow/katib/pull/1932) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
|
||||
## Misc
|
||||
|
||||
- Update Training operator Image in CI ([#2103](https://github.com/kubeflow/katib/pull/2103) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Upgrade Go libraries to resolve security issues ([#2094](https://github.com/kubeflow/katib/pull/2094) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Run e2e with various Python versions to verify Python SDK ([#2092](https://github.com/kubeflow/katib/pull/2092) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add a --prefer-binary flag to 'pip install' command ([#2096](https://github.com/kubeflow/katib/pull/2096) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade PyTorch version to v1.13.0 ([#2082](https://github.com/kubeflow/katib/pull/2082) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade Tensorflow version ([#2079](https://github.com/kubeflow/katib/pull/2079) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade Python version to 3.10 ([#2057](https://github.com/kubeflow/katib/pull/2057) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Pin the NumPy version with v1.23.5 in some images ([#2070](https://github.com/kubeflow/katib/pull/2070) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade the actions-setup-minikube version to v2.7.2 ([#2064](https://github.com/kubeflow/katib/pull/2064) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove Certificate Chain from Cert Generator ([#2045](https://github.com/kubeflow/katib/pull/2045) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Add resources to earlystopping container ([#2038](https://github.com/kubeflow/katib/pull/2038) by [@zhixian82](https://github.com/zhixian82))
|
||||
- Add scripts to verify generated codes and Go Modules ([#1999](https://github.com/kubeflow/katib/pull/1999) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [Test] Reduce Katib GitHub Action Runs ([#2036](https://github.com/kubeflow/katib/pull/2036) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- gh-actions: Extend action to run Frontend Unit tests ([#1998](https://github.com/kubeflow/katib/pull/1998) by [@orfeas-k](https://github.com/orfeas-k))
|
||||
- [chore] Upgrade docker/metadata-action, actions/checkout, and actions/setup-python version ([#1996](https://github.com/kubeflow/katib/pull/1996) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [chore] Upgrade Go version to v1.19 ([#1995](https://github.com/kubeflow/katib/pull/1995) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in simple-pbt image ([#1948](https://github.com/kubeflow/katib/pull/1948) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support arm64 in darts-cnn-cifar10 image ([#1947](https://github.com/kubeflow/katib/pull/1947) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in enas-cnn-cifar10 image ([#1944](https://github.com/kubeflow/katib/pull/1944) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in pytorch-mnist image ([#1943](https://github.com/kubeflow/katib/pull/1943) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Support for arm64 in mxnet-mnist image ([#1940](https://github.com/kubeflow/katib/pull/1940) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Use the katib-new-ui for Charmed gh-actions ([#1987](https://github.com/kubeflow/katib/pull/1987) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- [feat] health check for katib-controller ([#1934](https://github.com/kubeflow/katib/pull/1934) by [@anencore94](https://github.com/anencore94))
|
||||
- Upgrade Optuna from v2.x.x to v3.0.0 ([#1942](https://github.com/kubeflow/katib/pull/1942) by [@keisuke-umezawa](https://github.com/keisuke-umezawa))
|
||||
- Add validation webhooks for maxFailedTrialCount and parallelTrialCount ([#1936](https://github.com/kubeflow/katib/pull/1936) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Introduce Automatic platform ARGs ([#1935](https://github.com/kubeflow/katib/pull/1935) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Update training operator image in CI ([#1933](https://github.com/kubeflow/katib/pull/1933) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Update Katib SDK version ([#1931](https://github.com/kubeflow/katib/pull/1931) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- [chore] Upgrade Go version to v1.18 ([#1925](https://github.com/kubeflow/katib/pull/1925) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add the pytorch-mnist with GPU support container image ([#1916](https://github.com/kubeflow/katib/pull/1916) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.14.0...v0.15.0-rc.0)
|
||||
|
||||
# [v0.14.0](https://github.com/kubeflow/katib/tree/v0.14.0) (2022-08-18)
|
||||
|
||||
## New Features
|
||||
|
||||
### Core Features
|
||||
|
||||
- Population based training ([#1833](https://github.com/kubeflow/katib/pull/1833) by [@a9p](https://github.com/a9p))
|
||||
- Support JSON format logs in `file-metrics-collector` ([#1765](https://github.com/kubeflow/katib/pull/1765) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Include MetricsUnavailable condition to Complete in Trial ([#1877](https://github.com/kubeflow/katib/pull/1877) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Allow running examples on Apple Silicon M1 and fix image build errors for arm64 ([#1898](https://github.com/kubeflow/katib/pull/1898) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Configurable job name and service name for cert generator ([#1889](https://github.com/kubeflow/katib/pull/1889) by [@shaowei-su](https://github.com/shaowei-su))
|
||||
|
||||
### UI Features and Enhancements
|
||||
|
||||
- Add PBT to experiment creation form ([#1909](https://github.com/kubeflow/katib/pull/1909) by [@a9p](https://github.com/a9p))
|
||||
- Distinct page for each Trial in the UI ([#1783](https://github.com/kubeflow/katib/pull/1783) by [@d-gol](https://github.com/d-gol))
|
||||
|
||||
## Bug fixes
|
||||
|
||||
- Add the pytorch-mnist with GPU support container image ([#1917](https://github.com/kubeflow/katib/pull/1917) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Fix push script to include new images ([#1912](https://github.com/kubeflow/katib/pull/1912) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Fixes lint warnings in YAML files ([#1902](https://github.com/kubeflow/katib/pull/1902) by [@Rishit-dagli](https://github.com/Rishit-dagli))
|
||||
- Fix errors when running the test on Apple Silicon M1 ([#1886](https://github.com/kubeflow/katib/pull/1886) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Reconcile trial assignments by comparing suggestion and trials being executed ([#1831](https://github.com/kubeflow/katib/pull/1831) by [@henrysecond1](https://github.com/henrysecond1))
|
||||
- Increate the probes seconds in manifests ([#1845](https://github.com/kubeflow/katib/pull/1845) by [@haoxins](https://github.com/haoxins))
|
||||
- Set upper constraint for Optuna ([#1852](https://github.com/kubeflow/katib/pull/1852) by [@himkt](https://github.com/himkt))
|
||||
- Don't check if trial's metadata is in spec.parameters ([#1848](https://github.com/kubeflow/katib/pull/1848) by [@alexeygorobets](https://github.com/alexeygorobets))
|
||||
|
||||
## Documentation
|
||||
|
||||
- Fix the FPGA examples documentation ([#1841](https://github.com/kubeflow/katib/pull/1841) by [@eliaskoromilas](https://github.com/eliaskoromilas))
|
||||
- Add CyberAgent to adopters ([#1894](https://github.com/kubeflow/katib/pull/1894) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
## Misc
|
||||
|
||||
- Updating the training operator image in CI ([#1910](https://github.com/kubeflow/katib/pull/1910) by [@johnugeorge](https://github.com/johnugeorge))
|
||||
- Upgrade Python and Pytorch versions for some examples ([#1906](https://github.com/kubeflow/katib/pull/1906) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Linting for K8s YAML files ([#1901](https://github.com/kubeflow/katib/pull/1901) by [@Rishit-dagli](https://github.com/Rishit-dagli))
|
||||
- Change integration test sysytem from KinD Cluster to Minikube Cluster ([#1899](https://github.com/kubeflow/katib/pull/1899) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade mysql version to v8.0.29 ([#1897](https://github.com/kubeflow/katib/pull/1897) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade tensorflow-aarch64 version to v2.9.1 ([#1891](https://github.com/kubeflow/katib/pull/1891) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- chore: Upgrade Go libraries to resolve some security issues in the katib-controller ([#1888](https://github.com/kubeflow/katib/pull/1888) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Migrate kubeflow-katib-presubmit to GitHub Actions ([#1882](https://github.com/kubeflow/katib/pull/1882) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add semicolon when using `command` command in Makefile ([#1885](https://github.com/kubeflow/katib/pull/1885) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Fix `HAS_SHELLCHECK` and `HAS_SETUP_ENVTEST` in Makefile ([#1884](https://github.com/kubeflow/katib/pull/1884) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Remove presubmit tests depending on optional-test-infra ([#1871](https://github.com/kubeflow/katib/pull/1871) by [@aws-kf-ci-bot](https://github.com/aws-kf-ci-bot))
|
||||
- Upgrade the Tensorflow version to address some security issues ([#1870](https://github.com/kubeflow/katib/pull/1870) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade the grpc_health_probe version to v0.4.11 to resolve security vulnerability CVE-2022-27191 ([#1875](https://github.com/kubeflow/katib/pull/1875) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- additional metric names should not include objective metric name ([#1874](https://github.com/kubeflow/katib/pull/1874) by [@henrysecond1](https://github.com/henrysecond1))
|
||||
- Upgrade the Kubernetes Python client to 22.6.0 ([#1869](https://github.com/kubeflow/katib/pull/1869) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Upgrade the kubebuilder to v3.2.0 and Kubernetes Go libraries to v1.22.2 ([#1861](https://github.com/kubeflow/katib/pull/1861) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Update FPGA XGBoost example ([#1865](https://github.com/kubeflow/katib/pull/1865) by [@eliaskoromilas](https://github.com/eliaskoromilas))
|
||||
- Fix kubeflowkatib/mxnet-mnist image ([#1866](https://github.com/kubeflow/katib/pull/1866) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- pins pip and setuptools versions operators to avoid installation issues ([#1867](https://github.com/kubeflow/katib/pull/1867) by [@DnPlas](https://github.com/DnPlas))
|
||||
- Add shellcheck ([#1857](https://github.com/kubeflow/katib/pull/1857) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Bump kubeflow-katib and kfp version in notebook examples ([#1849](https://github.com/kubeflow/katib/pull/1849) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Add prometheus scraping and grafana support to charmed katib-controller operator ([#1839](https://github.com/kubeflow/katib/pull/1839) by [@jardon](https://github.com/jardon))
|
||||
- Upgrade Black to fix linting ([#1842](https://github.com/kubeflow/katib/pull/1842) by [@jardon](https://github.com/jardon))
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.13.0...v0.14.0).
|
||||
|
||||
# [v0.13.0](https://github.com/kubeflow/katib/tree/v0.13.0) (2022-03-04)
|
||||
## [v0.13.0](https://github.com/kubeflow/katib/tree/v0.13.0) (2022-03-04)
|
||||
|
||||
## New Features
|
||||
|
||||
|
@ -874,6 +59,7 @@
|
|||
- Fix default label for Training Operators ([#1813](https://github.com/kubeflow/katib/pull/1813) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
- Update supported Python version for Katib SDK ([#1798](https://github.com/kubeflow/katib/pull/1798) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
|
||||
## Misc
|
||||
|
||||
- Use release tags for Trial images ([#1757](https://github.com/kubeflow/katib/pull/1757) by [@andreyvelich](https://github.com/andreyvelich))
|
||||
|
@ -888,9 +74,10 @@
|
|||
- Add envtest to check `reconcileRBAC` ([#1678](https://github.com/kubeflow/katib/pull/1678) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
- Use golangci-lint as linter for Go ([#1671](https://github.com/kubeflow/katib/pull/1671) by [@tenzen-y](https://github.com/tenzen-y))
|
||||
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.12.0...v0.13.0)
|
||||
|
||||
# [v0.13.0-rc.1](https://github.com/kubeflow/katib/tree/v0.13.0-rc.1) (2022-02-15)
|
||||
## [v0.13.0-rc.1](https://github.com/kubeflow/katib/tree/v0.13.0-rc.1) (2022-02-15)
|
||||
|
||||
## Bug fixes
|
||||
|
||||
|
@ -899,7 +86,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.13.0-rc.0...v0.13.0-rc.1)
|
||||
|
||||
# [v0.13.0-rc.0](https://github.com/kubeflow/katib/tree/v0.13.0-rc.0) (2022-01-25)
|
||||
## [v0.13.0-rc.0](https://github.com/kubeflow/katib/tree/v0.13.0-rc.0) (2022-01-25)
|
||||
|
||||
## New Features
|
||||
|
||||
|
@ -972,7 +159,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.12.0...v0.13.0-rc.0)
|
||||
|
||||
# [v0.12.0](https://github.com/kubeflow/katib/tree/v0.12.0) (2021-10-05)
|
||||
## [v0.12.0](https://github.com/kubeflow/katib/tree/v0.12.0) (2021-10-05)
|
||||
|
||||
## New Features
|
||||
|
||||
|
@ -1028,7 +215,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.11.1...v0.12.0)
|
||||
|
||||
# [v0.12.0-rc.1](https://github.com/kubeflow/katib/tree/v0.12.0-rc.1) (2021-09-07)
|
||||
## [v0.12.0-rc.1](https://github.com/kubeflow/katib/tree/v0.12.0-rc.1) (2021-09-07)
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
|
@ -1037,7 +224,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.12.0-rc.0...v0.12.0-rc.1)
|
||||
|
||||
# [v0.12.0-rc.0](https://github.com/kubeflow/katib/tree/v0.12.0-rc.0) (2021-08-19)
|
||||
## [v0.12.0-rc.0](https://github.com/kubeflow/katib/tree/v0.12.0-rc.0) (2021-08-19)
|
||||
|
||||
## New Features
|
||||
|
||||
|
@ -1091,7 +278,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.11.1...v0.12.0-rc.0)
|
||||
|
||||
# [v0.11.1](https://github.com/kubeflow/katib/tree/v0.11.1) (2021-06-09)
|
||||
## [v0.11.1](https://github.com/kubeflow/katib/tree/v0.11.1) (2021-06-09)
|
||||
|
||||
## Bug fixes
|
||||
|
||||
|
@ -1105,7 +292,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.11.0...v0.11.1)
|
||||
|
||||
# [v0.11.0](https://github.com/kubeflow/katib/tree/v0.11.0) (2021-03-22)
|
||||
## [v0.11.0](https://github.com/kubeflow/katib/tree/v0.11.0) (2021-03-22)
|
||||
|
||||
## New Features
|
||||
|
||||
|
@ -1162,7 +349,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.10.1...v0.11.0)
|
||||
|
||||
# [v0.10.1](https://github.com/kubeflow/katib/tree/v0.10.1) (2021-03-02)
|
||||
## [v0.10.1](https://github.com/kubeflow/katib/tree/v0.10.1) (2021-03-02)
|
||||
|
||||
## Features and Bug Fixes
|
||||
|
||||
|
@ -1196,7 +383,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.10.0...v0.10.1)
|
||||
|
||||
# [v0.10.0](https://github.com/kubeflow/katib/tree/v0.10.0) (2020-11-07)
|
||||
## [v0.10.0](https://github.com/kubeflow/katib/tree/v0.10.0) (2020-11-07)
|
||||
|
||||
## New Features
|
||||
|
||||
|
@ -1240,7 +427,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.9.0...v0.10.0)
|
||||
|
||||
# [v0.9.0](https://github.com/kubeflow/katib/tree/v0.9.0) (2020-06-10)
|
||||
## [v0.9.0](https://github.com/kubeflow/katib/tree/v0.9.0) (2020-06-10)
|
||||
|
||||
## Features and Bug Fixes
|
||||
|
||||
|
@ -1497,7 +684,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.6.0-rc.0...v0.9.0)
|
||||
|
||||
# [v0.6.0-rc.0](https://github.com/kubeflow/katib/tree/v0.6.0-rc.0) (2019-06-28)
|
||||
## [v0.6.0-rc.0](https://github.com/kubeflow/katib/tree/v0.6.0-rc.0) (2019-06-28)
|
||||
|
||||
## Features and Bug Fixes
|
||||
|
||||
|
@ -1752,7 +939,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/826657c14602a3f36263f3d6769451af0a75d18a...v0.6.0-rc.0)
|
||||
|
||||
# [0.2](https://github.com/kubeflow/katib/tree/0.2) (2018-08-20)
|
||||
## [0.2](https://github.com/kubeflow/katib/tree/0.2) (2018-08-20)
|
||||
|
||||
## Features
|
||||
|
||||
|
@ -1779,7 +966,7 @@
|
|||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.1.2-alpha...826657c14602a3f36263f3d6769451af0a75d18a)
|
||||
|
||||
# [v0.1.2-alpha](https://github.com/kubeflow/katib/tree/v0.1.2-alpha) (2018-06-05)
|
||||
## [v0.1.2-alpha](https://github.com/kubeflow/katib/tree/v0.1.2-alpha) (2018-06-05)
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.1.1-alpha...v0.1.2-alpha)
|
||||
|
||||
|
@ -1810,7 +997,7 @@
|
|||
- Refine API [\#74](https://github.com/kubeflow/katib/pull/74) ([YujiOshima](https://github.com/YujiOshima))
|
||||
- worker: Rename worker_interface to worker [\#70](https://github.com/kubeflow/katib/pull/70) ([gaocegege](https://github.com/gaocegege))
|
||||
|
||||
# [v0.1.1-alpha](https://github.com/kubeflow/katib/tree/v0.1.1-alpha) (2018-04-24)
|
||||
## [v0.1.1-alpha](https://github.com/kubeflow/katib/tree/v0.1.1-alpha) (2018-04-24)
|
||||
|
||||
[Full Changelog](https://github.com/kubeflow/katib/compare/v0.1.0-alpha...v0.1.1-alpha)
|
||||
|
||||
|
@ -1848,7 +1035,7 @@
|
|||
- New db log schema [\#35](https://github.com/kubeflow/katib/pull/35) ([YujiOshima](https://github.com/YujiOshima))
|
||||
- Fix CI failures [\#27](https://github.com/kubeflow/katib/pull/27) ([gaocegege](https://github.com/gaocegege))
|
||||
|
||||
# [v0.1.0-alpha](https://github.com/kubeflow/katib/tree/v0.1.0-alpha) (2018-04-10)
|
||||
## [v0.1.0-alpha](https://github.com/kubeflow/katib/tree/v0.1.0-alpha) (2018-04-10)
|
||||
|
||||
**Closed issues:**
|
||||
|
||||
|
|
43
CITATION.cff
43
CITATION.cff
|
@ -1,43 +0,0 @@
|
|||
cff-version: 1.2.0
|
||||
message: "If you use Katib in your scientific publication, please cite it as below."
|
||||
authors:
|
||||
- family-names: "George"
|
||||
given-names: "Johnu"
|
||||
- family-names: "Gao"
|
||||
given-names: "Ce"
|
||||
- family-names: "Liu"
|
||||
given-names: "Richard"
|
||||
- family-names: "Liu"
|
||||
given-names: "Hou Gang"
|
||||
- family-names: "Tang"
|
||||
given-names: "Yuan"
|
||||
- family-names: "Pydipaty"
|
||||
given-names: "Ramdoot"
|
||||
- family-names: "Saha"
|
||||
given-names: "Amit Kumar"
|
||||
title: "Katib"
|
||||
type: software
|
||||
repository-code: "https://github.com/kubeflow/katib"
|
||||
preferred-citation:
|
||||
type: misc
|
||||
title: "A Scalable and Cloud-Native Hyperparameter Tuning System"
|
||||
authors:
|
||||
- family-names: "George"
|
||||
given-names: "Johnu"
|
||||
- family-names: "Gao"
|
||||
given-names: "Ce"
|
||||
- family-names: "Liu"
|
||||
given-names: "Richard"
|
||||
- family-names: "Liu"
|
||||
given-names: "Hou Gang"
|
||||
- family-names: "Tang"
|
||||
given-names: "Yuan"
|
||||
- family-names: "Pydipaty"
|
||||
given-names: "Ramdoot"
|
||||
- family-names: "Saha"
|
||||
given-names: "Amit Kumar"
|
||||
year: 2020
|
||||
url: "https://arxiv.org/abs/2006.02085"
|
||||
identifiers:
|
||||
- type: "other"
|
||||
value: "arXiv:2006.02085"
|
167
CONTRIBUTING.md
167
CONTRIBUTING.md
|
@ -1,167 +0,0 @@
|
|||
# Developer Guide
|
||||
|
||||
This developer guide is for people who want to contribute to the Katib project.
|
||||
If you're interesting in using Katib in your machine learning project,
|
||||
see the following guides:
|
||||
|
||||
- [Getting started with Katib](https://kubeflow.org/docs/components/katib/hyperparameter/).
|
||||
- [How to configure Katib Experiment](https://kubeflow.org/docs/components/katib/experiment/).
|
||||
- [Katib architecture and concepts](https://www.kubeflow.org/docs/components/katib/reference/architecture/)
|
||||
for hyperparameter tuning and neural architecture search.
|
||||
|
||||
## Requirements
|
||||
|
||||
- [Go](https://golang.org/) (1.22 or later)
|
||||
- [Docker](https://docs.docker.com/) (24.0 or later)
|
||||
- [Docker Buildx](https://docs.docker.com/build/buildx/) (0.8.0 or later)
|
||||
- [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html) (8 or later)
|
||||
- [Python](https://www.python.org/) (3.11 or later)
|
||||
- [kustomize](https://kustomize.io/) (4.0.5 or later)
|
||||
- [pre-commit](https://pre-commit.com/)
|
||||
|
||||
## Build from source code
|
||||
|
||||
**Note** that your Docker Desktop should
|
||||
[enable containerd image store](https://docs.docker.com/desktop/containerd/#enable-the-containerd-image-store)
|
||||
to build multi-arch images. Check source code as follows:
|
||||
|
||||
```bash
|
||||
make build REGISTRY=<image-registry> TAG=<image-tag>
|
||||
```
|
||||
|
||||
If you are using an Apple Silicon machine and encounter the "rosetta error: bss_size overflow," go to Docker Desktop -> General and uncheck "Use Rosetta for x86_64/amd64 emulation on Apple Silicon."
|
||||
|
||||
To use your custom images for the Katib components, modify
|
||||
[Kustomization file](https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/installs/katib-standalone/kustomization.yaml)
|
||||
and [Katib Config](https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/installs/katib-standalone/katib-config.yaml)
|
||||
|
||||
You can deploy Katib v1beta1 manifests into a Kubernetes cluster as follows:
|
||||
|
||||
```bash
|
||||
make deploy
|
||||
```
|
||||
|
||||
You can undeploy Katib v1beta1 manifests from a Kubernetes cluster as follows:
|
||||
|
||||
```bash
|
||||
make undeploy
|
||||
```
|
||||
|
||||
## Technical and style guide
|
||||
|
||||
The following guidelines apply primarily to Katib,
|
||||
but other projects like [Training Operator](https://github.com/kubeflow/training-operator) might also adhere to them.
|
||||
|
||||
## Go Development
|
||||
|
||||
When coding:
|
||||
|
||||
- Follow [effective go](https://go.dev/doc/effective_go) guidelines.
|
||||
- Run locally [`make check`](https://github.com/kubeflow/katib/blob/46173463027e4fd2e604e25d7075b2b31a702049/Makefile#L31)
|
||||
to verify if changes follow best practices before submitting PRs.
|
||||
|
||||
Testing:
|
||||
|
||||
- Use [`cmp.Diff`](https://pkg.go.dev/github.com/google/go-cmp/cmp#Diff) instead of `reflect.Equal`, to provide useful comparisons.
|
||||
- Define test cases as maps instead of slices to avoid dependencies on the running order.
|
||||
Map key should be equal to the test case name.
|
||||
|
||||
## Modify controller APIs
|
||||
|
||||
If you want to modify Katib controller APIs, you have to
|
||||
generate deepcopy, clientset, listers, informers, open-api and Python SDK with the changed APIs.
|
||||
You can update the necessary files as follows:
|
||||
|
||||
```bash
|
||||
make generate
|
||||
```
|
||||
|
||||
## Controller Flags
|
||||
|
||||
Below is a list of command-line flags accepted by Katib controller:
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
| ------------ | ------ | ------- | -------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| katib-config | string | "" | The katib-controller will load its initial configuration from this file. Omit this flag to use the default configuration values. |
|
||||
|
||||
## DB Manager Flags
|
||||
|
||||
Below is a list of command-line flags accepted by Katib DB Manager:
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
| --------------- | ------------- | -------------| ------------------------------------------------------------------- |
|
||||
| connect-timeout | time.Duration | 60s | Timeout before calling error during database connection |
|
||||
| listen-address | string | 0.0.0.0:6789 | The network interface or IP address to receive incoming connections |
|
||||
|
||||
## Katib admission webhooks
|
||||
|
||||
Katib uses three [Kubernetes admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/).
|
||||
|
||||
1. `validator.experiment.katib.kubeflow.org` -
|
||||
[Validating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook)
|
||||
to validate the Katib Experiment before the creation.
|
||||
|
||||
1. `defaulter.experiment.katib.kubeflow.org` -
|
||||
[Mutating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook)
|
||||
to set the [default values](../pkg/apis/controller/experiments/v1beta1/experiment_defaults.go)
|
||||
in the Katib Experiment before the creation.
|
||||
|
||||
1. `mutator.pod.katib.kubeflow.org` - Mutating admission webhook to inject the metrics
|
||||
collector sidecar container to the training pod. Learn more about the Katib's
|
||||
metrics collector in the
|
||||
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/).
|
||||
|
||||
You can find the YAMLs for the Katib webhooks
|
||||
[here](../manifests/v1beta1/components/webhook/webhooks.yaml).
|
||||
|
||||
**Note:** If you are using a private Kubernetes cluster, you have to allow traffic
|
||||
via `TCP:8443` by specifying the firewall rule and you have to update the master
|
||||
plane CIDR source range to use the Katib webhooks
|
||||
|
||||
### Katib cert generator
|
||||
|
||||
Katib Controller has the internal `cert-generator` to generate certificates for the webhooks.
|
||||
|
||||
Once Katib is deployed in the Kubernetes cluster, the `cert-generator` follows these steps:
|
||||
|
||||
- Generate the self-signed certificate and private key.
|
||||
|
||||
- Update a Kubernetes Secret with the self-signed TLS certificate and private key.
|
||||
- Patch the webhooks with the `CABundle`.
|
||||
|
||||
Once the `cert-generator` finished, the Katib controller starts to register controllers such as `experiment-controller` to the manager.
|
||||
|
||||
You can find the `cert-generator` source code [here](../pkg/certgenerator/v1beta1).
|
||||
|
||||
NOTE: the Katib also supports the [cert-manager](https://cert-manager.io/) to generate certs for the admission webhooks instead of using cert-generator.
|
||||
You can find the installation with the cert-manager [here](../manifests/v1beta1/installs/katib-cert-manager).
|
||||
|
||||
## Implement a new algorithm and use it in Katib
|
||||
|
||||
Please see [new-algorithm-service.md](./new-algorithm-service.md).
|
||||
|
||||
## Katib UI documentation
|
||||
|
||||
Please see [Katib UI README](../pkg/ui/v1beta1).
|
||||
|
||||
## Design proposals
|
||||
|
||||
Please see [proposals](./proposals).
|
||||
|
||||
## Code Style
|
||||
|
||||
### pre-commit
|
||||
|
||||
Make sure to install [pre-commit](https://pre-commit.com/) (`pip install
|
||||
pre-commit`) and run `pre-commit install` from the root of the repository at
|
||||
least once before creating git commits.
|
||||
|
||||
The pre-commit [hooks](../.pre-commit-config.yaml) ensure code quality and
|
||||
consistency. They are executed in CI. PRs that fail to comply with the hooks
|
||||
will not be able to pass the corresponding CI gate. The hooks are only executed
|
||||
against staged files unless you run `pre-commit run --all`, in which case,
|
||||
they'll be executed against every file in the repository.
|
||||
|
||||
Specific programmatically generated files listed in the `exclude` field in
|
||||
[.pre-commit-config.yaml](../.pre-commit-config.yaml) are deliberately excluded
|
||||
from the hooks.
|
|
@ -1,32 +0,0 @@
|
|||
# Copyright 2023 The Kubeflow Authors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# Dockerfile for building the source code of conformance tests
|
||||
FROM python:3.10-slim
|
||||
|
||||
WORKDIR /kubeflow/katib
|
||||
|
||||
COPY sdk/ /kubeflow/katib/sdk/
|
||||
COPY examples/ /kubeflow/katib/examples/
|
||||
COPY test/ /kubeflow/katib/test/
|
||||
COPY pkg/ /kubeflow/katib/pkg/
|
||||
|
||||
COPY conformance/run.sh .
|
||||
|
||||
# Add test script.
|
||||
RUN chmod +x run.sh
|
||||
|
||||
RUN pip install --prefer-binary -e sdk/python/v1beta1
|
||||
|
||||
ENTRYPOINT [ "./run.sh" ]
|
124
Makefile
124
Makefile
|
@ -2,46 +2,45 @@ HAS_LINT := $(shell command -v golangci-lint;)
|
|||
HAS_YAMLLINT := $(shell command -v yamllint;)
|
||||
HAS_SHELLCHECK := $(shell command -v shellcheck;)
|
||||
HAS_SETUP_ENVTEST := $(shell command -v setup-envtest;)
|
||||
HAS_MOCKGEN := $(shell command -v mockgen;)
|
||||
|
||||
COMMIT := v1beta1-$(shell git rev-parse --short=7 HEAD)
|
||||
KATIB_REGISTRY := ghcr.io/kubeflow/katib
|
||||
CPU_ARCH ?= linux/amd64,linux/arm64
|
||||
ENVTEST_K8S_VERSION ?= 1.31
|
||||
MOCKGEN_VERSION ?= $(shell grep 'go.uber.org/mock' go.mod | cut -d ' ' -f 2)
|
||||
GO_VERSION=$(shell grep '^go' go.mod | cut -d ' ' -f 2)
|
||||
GOPATH ?= $(shell go env GOPATH)
|
||||
KATIB_REGISTRY := docker.io/kubeflowkatib
|
||||
CPU_ARCH ?= amd64
|
||||
ENVTEST_K8S_VERSION ?= 1.23
|
||||
|
||||
# for pytest
|
||||
PYTHONPATH := $(PYTHONPATH):$(CURDIR)/pkg/apis/manager/v1beta1/python:$(CURDIR)/pkg/apis/manager/health/python
|
||||
PYTHONPATH := $(PYTHONPATH):$(CURDIR)/pkg/metricscollector/v1beta1/common:$(CURDIR)/pkg/metricscollector/v1beta1/tfevent-metricscollector
|
||||
TEST_TENSORFLOW_EVENT_FILE_PATH ?= $(CURDIR)/test/unit/v1beta1/metricscollector/testdata/tfevent-metricscollector/logs
|
||||
|
||||
# Run tests
|
||||
.PHONY: test
|
||||
test: envtest
|
||||
KUBEBUILDER_ASSETS="$(shell setup-envtest use $(ENVTEST_K8S_VERSION) -p path)" go test ./pkg/... ./cmd/... -coverprofile coverage.out
|
||||
KUBEBUILDER_ASSETS="$(shell setup-envtest --arch=amd64 use $(ENVTEST_K8S_VERSION) -p path)" go test ./pkg/... ./cmd/... -coverprofile coverage.out
|
||||
|
||||
envtest:
|
||||
ifndef HAS_SETUP_ENVTEST
|
||||
go install sigs.k8s.io/controller-runtime/tools/setup-envtest@release-0.19
|
||||
$(info "setup-envtest has been installed")
|
||||
go install sigs.k8s.io/controller-runtime/tools/setup-envtest@bf71fc56485f6bf03e95ef6b0233ff36c695d4c9 # v0.11.2
|
||||
@echo "setup-envtest has been installed"
|
||||
endif
|
||||
$(info "setup-envtest has already installed")
|
||||
@echo "setup-envtest has already installed"
|
||||
|
||||
check: generated-codes go-mod fmt vet lint
|
||||
check: generate fmt vet lint
|
||||
|
||||
fmt:
|
||||
hack/verify-gofmt.sh
|
||||
|
||||
lint:
|
||||
ifndef HAS_LINT
|
||||
go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.64.7
|
||||
$(info "golangci-lint has been installed")
|
||||
go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.42.1
|
||||
@echo "golangci-lint has been installed"
|
||||
endif
|
||||
hack/verify-golangci-lint.sh
|
||||
|
||||
yamllint:
|
||||
ifndef HAS_YAMLLINT
|
||||
pip install --prefer-binary yamllint
|
||||
$(info "yamllint has been installed")
|
||||
pip install yamllint
|
||||
@echo "yamllint has been installed"
|
||||
endif
|
||||
hack/verify-yamllint.sh
|
||||
|
||||
|
@ -51,7 +50,7 @@ vet:
|
|||
shellcheck:
|
||||
ifndef HAS_SHELLCHECK
|
||||
bash hack/install-shellcheck.sh
|
||||
$(info "shellcheck has been installed")
|
||||
@echo "shellcheck has been installed"
|
||||
endif
|
||||
hack/verify-shellcheck.sh
|
||||
|
||||
|
@ -60,49 +59,25 @@ update:
|
|||
|
||||
# Deploy Katib v1beta1 manifests using Kustomize into a k8s cluster.
|
||||
deploy:
|
||||
bash scripts/v1beta1/deploy.sh $(WITH_DATABASE_TYPE)
|
||||
bash scripts/v1beta1/deploy.sh
|
||||
|
||||
# Undeploy Katib v1beta1 manifests using Kustomize from a k8s cluster
|
||||
undeploy:
|
||||
bash scripts/v1beta1/undeploy.sh
|
||||
|
||||
generated-codes: generate
|
||||
ifneq ($(shell bash hack/verify-generated-codes.sh '.'; echo $$?),0)
|
||||
$(error 'Please run "make generate" to generate codes')
|
||||
endif
|
||||
|
||||
go-mod: sync-go-mod
|
||||
ifneq ($(shell bash hack/verify-generated-codes.sh 'go.*'; echo $$?),0)
|
||||
$(error 'Please run "go mod tidy -go $(GO_VERSION)" to sync Go modules')
|
||||
endif
|
||||
|
||||
sync-go-mod:
|
||||
go mod tidy -go $(GO_VERSION)
|
||||
|
||||
.PHONY: go-mod-download
|
||||
go-mod-download:
|
||||
go mod download
|
||||
|
||||
CONTROLLER_GEN = $(shell pwd)/bin/controller-gen
|
||||
.PHONY: controller-gen
|
||||
controller-gen:
|
||||
@GOBIN=$(shell pwd)/bin GO111MODULE=on go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.16.5
|
||||
|
||||
# Run this if you update any existing controller APIs.
|
||||
# 1. Generate deepcopy, clientset, listers, informers for the APIs (hack/update-codegen.sh)
|
||||
# 1. Genereate deepcopy, clientset, listers, informers for the APIs (hack/update-codegen.sh)
|
||||
# 2. Generate open-api for the APIs (hack/update-openapigen)
|
||||
# 3. Generate Python SDK for Katib (hack/gen-python-sdk/gen-sdk.sh)
|
||||
# 4. Generate gRPC manager APIs (pkg/apis/manager/v1beta1/build.sh and pkg/apis/manager/health/build.sh)
|
||||
# 5. Generate Go mock codes
|
||||
generate: go-mod-download controller-gen
|
||||
ifndef HAS_MOCKGEN
|
||||
go install go.uber.org/mock/mockgen@$(MOCKGEN_VERSION)
|
||||
$(info "mockgen has been installed")
|
||||
generate:
|
||||
ifndef GOPATH
|
||||
$(error GOPATH not defined, please define GOPATH. Run "go help gopath" to learn more about GOPATH)
|
||||
endif
|
||||
go generate ./pkg/... ./cmd/...
|
||||
hack/gen-python-sdk/gen-sdk.sh
|
||||
hack/update-proto.sh
|
||||
hack/update-mockgen.sh
|
||||
pkg/apis/manager/v1beta1/build.sh
|
||||
pkg/apis/manager/health/build.sh
|
||||
|
||||
# Build images for the Katib v1beta1 components.
|
||||
build: generate
|
||||
|
@ -119,12 +94,14 @@ push-latest: generate
|
|||
bash scripts/v1beta1/push.sh $(KATIB_REGISTRY) $(COMMIT)
|
||||
|
||||
# Build and push Katib images for the given tag.
|
||||
push-tag:
|
||||
push-tag: generate
|
||||
ifeq ($(TAG),)
|
||||
$(error TAG must be set. Usage: make push-tag TAG=<release-tag>)
|
||||
endif
|
||||
bash scripts/v1beta1/build.sh $(KATIB_REGISTRY) $(TAG) $(CPU_ARCH)
|
||||
bash scripts/v1beta1/build.sh $(KATIB_REGISTRY) $(COMMIT) $(CPU_ARCH)
|
||||
bash scripts/v1beta1/push.sh $(KATIB_REGISTRY) $(TAG)
|
||||
bash scripts/v1beta1/push.sh $(KATIB_REGISTRY) $(COMMIT)
|
||||
|
||||
# Release a new version of Katib.
|
||||
release:
|
||||
|
@ -144,50 +121,31 @@ endif
|
|||
|
||||
# Prettier UI format check for Katib v1beta1.
|
||||
prettier-check:
|
||||
npm run format:check --prefix pkg/ui/v1beta1/frontend
|
||||
npm run format:check --prefix pkg/new-ui/v1beta1/frontend
|
||||
|
||||
# Update boilerplate for the source code.
|
||||
update-boilerplate:
|
||||
./hack/boilerplate/update-boilerplate.sh
|
||||
|
||||
prepare-pytest:
|
||||
pip install --prefer-binary -r test/unit/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/suggestion/hyperopt/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/suggestion/optuna/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/suggestion/hyperband/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/suggestion/nas/enas/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/suggestion/nas/darts/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/suggestion/pbt/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/earlystopping/medianstop/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/metricscollector/v1beta1/tfevent-metricscollector/requirements.txt
|
||||
# `TypeIs` was introduced in typing-extensions 4.10.0, and torch 2.6.0 requires typing-extensions>=4.10.0.
|
||||
# REF: https://github.com/kubeflow/katib/pull/2504
|
||||
# TODO (tenzen-y): Once we upgrade libraries depended on typing-extensions==4.5.0, we can remove this line.
|
||||
pip install typing-extensions==4.10.0
|
||||
pip install -r test/unit/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/chocolate/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/hyperopt/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/skopt/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/optuna/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/hyperband/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/nas/enas/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/nas/darts/v1beta1/requirements.txt
|
||||
pip install -r cmd/suggestion/pbt/v1beta1/requirements.txt
|
||||
pip install -r cmd/earlystopping/medianstop/v1beta1/requirements.txt
|
||||
pip install -r cmd/metricscollector/v1beta1/tfevent-metricscollector/requirements.txt
|
||||
|
||||
prepare-pytest-testdata:
|
||||
ifeq ("$(wildcard $(TEST_TENSORFLOW_EVENT_FILE_PATH))", "")
|
||||
python examples/v1beta1/trial-images/tf-mnist-with-summaries/mnist.py --epochs 5 --batch-size 200 --log-path $(TEST_TENSORFLOW_EVENT_FILE_PATH)
|
||||
endif
|
||||
|
||||
# TODO(Electronic-Waste): Remove the import rewrite when protobuf supports `python_package` option.
|
||||
# REF: https://github.com/protocolbuffers/protobuf/issues/7061
|
||||
pytest: prepare-pytest prepare-pytest-testdata
|
||||
pytest ./test/unit/v1beta1/suggestion --ignore=./test/unit/v1beta1/suggestion/test_skopt_service.py
|
||||
pytest ./test/unit/v1beta1/earlystopping
|
||||
pytest ./test/unit/v1beta1/metricscollector
|
||||
cp ./pkg/apis/manager/v1beta1/python/api_pb2.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2.py
|
||||
cp ./pkg/apis/manager/v1beta1/python/api_pb2_grpc.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
|
||||
sed -i "s/api_pb2/kubeflow\.katib\.katib_api_pb2/g" ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
|
||||
pytest ./sdk/python/v1beta1/kubeflow/katib
|
||||
rm ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
|
||||
|
||||
# The skopt service doesn't work appropriately with Python 3.11.
|
||||
# So, we need to run the test with Python 3.9.
|
||||
# TODO (tenzen-y): Once we stop to support skopt, we can remove this test.
|
||||
# REF: https://github.com/kubeflow/katib/issues/2280
|
||||
pytest-skopt:
|
||||
pip install six
|
||||
pip install --prefer-binary -r test/unit/v1beta1/requirements.txt
|
||||
pip install --prefer-binary -r cmd/suggestion/skopt/v1beta1/requirements.txt
|
||||
pytest ./test/unit/v1beta1/suggestion/test_skopt_service.py
|
||||
PYTHONPATH=$(PYTHONPATH) pytest ./test/unit/v1beta1/suggestion
|
||||
PYTHONPATH=$(PYTHONPATH) pytest ./test/unit/v1beta1/earlystopping
|
||||
PYTHONPATH=$(PYTHONPATH) pytest ./test/unit/v1beta1/metricscollector
|
||||
|
|
4
OWNERS
4
OWNERS
|
@ -1,10 +1,10 @@
|
|||
approvers:
|
||||
- andreyvelich
|
||||
- gaocegege
|
||||
- hougangliu
|
||||
- johnugeorge
|
||||
reviewers:
|
||||
- anencore94
|
||||
- c-bata
|
||||
- Electronic-Waste
|
||||
emeritus_approvers:
|
||||
- sperlingxx
|
||||
- tenzen-y
|
||||
|
|
132
README.md
132
README.md
|
@ -1,18 +1,15 @@
|
|||
# Kubeflow Katib
|
||||
|
||||
[](https://github.com/kubeflow/katib/actions/workflows/test-go.yaml?branch=master)
|
||||
[](https://coveralls.io/github/kubeflow/katib?branch=master)
|
||||
[](https://goreportcard.com/report/github.com/kubeflow/katib)
|
||||
[](https://github.com/kubeflow/katib/releases)
|
||||
[](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels)
|
||||
[](https://www.bestpractices.dev/projects/9941)
|
||||
|
||||
<h1 align="center">
|
||||
<img src="./docs/images/logo-title.png" alt="logo" width="200">
|
||||
<br>
|
||||
</h1>
|
||||
|
||||
Kubeflow Katib is a Kubernetes-native project for automated machine learning (AutoML).
|
||||
[](https://github.com/kubeflow/katib/actions/workflows/test-go.yaml?branch=master)
|
||||
[](https://coveralls.io/github/kubeflow/katib?branch=master)
|
||||
[](https://goreportcard.com/report/github.com/kubeflow/katib)
|
||||
[](https://github.com/kubeflow/katib/releases)
|
||||
[](https://kubeflow.slack.com/archives/C018PMV53NW)
|
||||
|
||||
Katib is a Kubernetes-native project for automated machine learning (AutoML).
|
||||
Katib supports
|
||||
[Hyperparameter Tuning](https://en.wikipedia.org/wiki/Hyperparameter_optimization),
|
||||
[Early Stopping](https://en.wikipedia.org/wiki/Early_stopping) and
|
||||
|
@ -21,7 +18,8 @@ Katib supports
|
|||
Katib is the project which is agnostic to machine learning (ML) frameworks.
|
||||
It can tune hyperparameters of applications written in any language of the
|
||||
users’ choice and natively supports many ML frameworks, such as
|
||||
[TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [XGBoost](https://xgboost.readthedocs.io/en/latest/), and others.
|
||||
[TensorFlow](https://www.tensorflow.org/), [Apache MXNet](https://mxnet.apache.org/),
|
||||
[PyTorch](https://pytorch.org/), [XGBoost](https://xgboost.readthedocs.io/en/latest/), and others.
|
||||
|
||||
Katib can perform training jobs using any Kubernetes
|
||||
[Custom Resources](https://www.kubeflow.org/docs/components/katib/trial-template/)
|
||||
|
@ -31,13 +29,13 @@ and many more.
|
|||
|
||||
Katib stands for `secretary` in Arabic.
|
||||
|
||||
## Search Algorithms
|
||||
# Search Algorithms
|
||||
|
||||
Katib supports several search algorithms. Follow the
|
||||
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/user-guides/hp-tuning/configure-algorithm/#hp-tuning-algorithms)
|
||||
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/experiment/#search-algorithms-in-detail)
|
||||
to know more about each algorithm and check the
|
||||
[this guide](https://www.kubeflow.org/docs/components/katib/user-guides/hp-tuning/configure-algorithm/#use-custom-algorithm-in-katib)
|
||||
to implement your custom algorithm.
|
||||
[Suggestion service guide](/docs/new-algorithm-service.md) to implement your
|
||||
custom algorithm.
|
||||
|
||||
<table>
|
||||
<tbody>
|
||||
|
@ -139,68 +137,102 @@ to implement your custom algorithm.
|
|||
</tbody>
|
||||
</table>
|
||||
|
||||
To perform the above algorithms Katib supports the following frameworks:
|
||||
To perform above algorithms Katib supports the following frameworks:
|
||||
|
||||
- [Chocolate](https://github.com/AIworx-Labs/chocolate)
|
||||
- [Goptuna](https://github.com/c-bata/goptuna)
|
||||
- [Hyperopt](https://github.com/hyperopt/hyperopt)
|
||||
- [Optuna](https://github.com/optuna/optuna)
|
||||
- [Scikit Optimize](https://github.com/scikit-optimize/scikit-optimize)
|
||||
|
||||
# Installation
|
||||
|
||||
For the various Katib installs check the
|
||||
[Kubeflow guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/#katib-setup).
|
||||
Follow the next steps to install Katib standalone.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Please check [the official Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/installation/#prerequisites)
|
||||
for prerequisites to install Katib.
|
||||
This is the minimal requirements to install Katib:
|
||||
|
||||
## Installation
|
||||
- Kubernetes >= 1.21
|
||||
- `kubectl` >= 1.21
|
||||
|
||||
Please follow [the Kubeflow Katib guide](https://www.kubeflow.org/docs/components/katib/installation/#installing-katib)
|
||||
for the detailed instructions on how to install Katib.
|
||||
## Latest Version
|
||||
|
||||
### Installing the Control Plane
|
||||
|
||||
Run the following command to install the latest stable release of Katib control plane:
|
||||
|
||||
```
|
||||
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.17.0"
|
||||
```
|
||||
|
||||
Run the following command to install the latest changes of Katib control plane:
|
||||
For the latest Katib version run this command:
|
||||
|
||||
```
|
||||
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"
|
||||
```
|
||||
|
||||
For the Katib Experiments check the [complete examples list](./examples/v1beta1).
|
||||
## Release Version
|
||||
|
||||
### Installing the Python SDK
|
||||
For the specific Katib release (for example `v0.13.0`) run this command:
|
||||
|
||||
Katib implements [a Python SDK](https://pypi.org/project/kubeflow-katib/) to simplify creation of
|
||||
hyperparameter tuning jobs for Data Scientists.
|
||||
|
||||
Run the following command to install the latest stable release of Katib SDK:
|
||||
|
||||
```sh
|
||||
pip install -U kubeflow-katib
|
||||
```
|
||||
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.13.0"
|
||||
```
|
||||
|
||||
## Getting Started
|
||||
Make sure that all Katib components are running:
|
||||
|
||||
Please refer to [the getting started guide](https://www.kubeflow.org/docs/components/katib/getting-started/#getting-started-with-katib-python-sdk)
|
||||
to quickly create your first hyperparameter tuning Experiment using the Python SDK.
|
||||
```
|
||||
$ kubectl get pods -n kubeflow
|
||||
|
||||
## Community
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
katib-cert-generator-rw95w 0/1 Completed 0 35s
|
||||
katib-controller-566595bdd8-hbxgf 1/1 Running 0 36s
|
||||
katib-db-manager-57cd769cdb-4g99m 1/1 Running 0 36s
|
||||
katib-mysql-7894994f88-5d4s5 1/1 Running 0 36s
|
||||
katib-ui-5767cfccdc-pwg2x 1/1 Running 0 36s
|
||||
```
|
||||
|
||||
The following links provide information on how to get involved in the community:
|
||||
For the Katib Experiments check the [complete examples list](./examples/v1beta1).
|
||||
|
||||
- Attend [the bi-weekly AutoML and Training Working Group](https://bit.ly/2PWVCkV)
|
||||
community meeting.
|
||||
- Join our [`#kubeflow-katib`](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels)
|
||||
Slack channel.
|
||||
- Check out [who is using Katib](ADOPTERS.md) and [presentations about Katib project](docs/presentations.md).
|
||||
# Documentation
|
||||
|
||||
- Run your first Katib Experiment in the
|
||||
[getting started guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/#example-using-random-algorithm).
|
||||
|
||||
- Learn about Katib **Concepts** in this
|
||||
[guide](https://www.kubeflow.org/docs/components/katib/overview/#katib-concepts).
|
||||
|
||||
- Learn about Katib **Interfaces** in this
|
||||
[guide](https://www.kubeflow.org/docs/components/katib/overview/#katib-interfaces).
|
||||
|
||||
- Learn about Katib **Components** in this
|
||||
[guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/#katib-components).
|
||||
|
||||
- Know more about Katib in the [presentations and demos list](./docs/presentations.md).
|
||||
|
||||
# Community
|
||||
|
||||
We are always growing our community and invite new users and AutoML enthusiasts
|
||||
to contribute to the Katib project. The following links provide information
|
||||
about getting involved in the community:
|
||||
|
||||
- Subscribe to the
|
||||
[AutoML calendar](https://calendar.google.com/calendar/u/0/r?cid=ZDQ5bnNpZWZzbmZna2Y5MW8wdThoMmpoazRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ)
|
||||
to attend Working Group bi-weekly community meetings.
|
||||
|
||||
- Check the
|
||||
[AutoML and Training Working Group meeting notes](https://docs.google.com/document/d/1MChKfzrKAeFRtYqypFbMXL6ZIc_OgijjkvbqmwRV-64/edit).
|
||||
|
||||
- If you use Katib, please update [the adopters list](ADOPTERS.md).
|
||||
|
||||
## Contributing
|
||||
|
||||
Please refer to the [CONTRIBUTING guide](CONTRIBUTING.md).
|
||||
Please feel free to test the system! [Developer guide](./docs/developer-guide.md)
|
||||
is a good starting point for our developers.
|
||||
|
||||
## Blog posts
|
||||
|
||||
- [Kubeflow Katib: Scalable, Portable and Cloud Native System for AutoML](https://blog.kubeflow.org/katib/)
|
||||
(by Andrey Velichkevich)
|
||||
|
||||
## Events
|
||||
|
||||
- [AutoML and Training WG Summit. 16th of July 2021](https://docs.google.com/document/d/1vGluSPHmAqEr8k9Dmm82RcQ-MVnqbYYSfnjMGB-aPuo/edit?usp=sharing)
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
44
ROADMAP.md
44
ROADMAP.md
|
@ -1,45 +1,3 @@
|
|||
# Katib 2022/2023 Roadmap
|
||||
|
||||
## AutoML Features
|
||||
|
||||
- Support advance HyperParameter tuning algorithms:
|
||||
|
||||
- Population Based Training (PBT) - [#1382](https://github.com/kubeflow/katib/issues/1382)
|
||||
- Tree of Parzen Estimators (TPE)
|
||||
- Multivariate TPE
|
||||
- Sobol’s Quasirandom Sequence
|
||||
- Asynchronous Successive Halving - [ASHA](https://arxiv.org/pdf/1810.05934.pdf)
|
||||
|
||||
- Support multi-objective optimization - [#1549](https://github.com/kubeflow/katib/issues/1549)
|
||||
- Support various HP distributions (log-uniform, uniform, normal) - [#1207](https://github.com/kubeflow/katib/issues/1207)
|
||||
- Support Auto Model Compression - [#460](https://github.com/kubeflow/katib/issues/460)
|
||||
- Support Auto Feature Engineering - [#475](https://github.com/kubeflow/katib/issues/475)
|
||||
- Improve Neural Architecture Search design
|
||||
|
||||
## Backend and API Enhancements
|
||||
|
||||
- Conformance tests for Katib - [#2044](https://github.com/kubeflow/katib/issues/2044)
|
||||
- Support push-based metrics collection in Katib - [#577](https://github.com/kubeflow/katib/issues/577)
|
||||
- Support PostgreSQL as a Katib DB - [#915](https://github.com/kubeflow/katib/issues/915)
|
||||
- Improve Katib scalability - [#1847](https://github.com/kubeflow/katib/issues/1847)
|
||||
- Promote Katib APIs to the `v1` version
|
||||
- Support multiple CRD versions (`v1beta1`, `v1`) with conversion webhook
|
||||
|
||||
## Improve Katib User Experience
|
||||
|
||||
- Simplify Katib Experiment creation with Katib SDK - [#1951](https://github.com/kubeflow/katib/pull/1951)
|
||||
- Fully migrate to a new Katib UI - [Project 1](https://github.com/kubeflow/katib/projects/1)
|
||||
- Expose Trial logs in Katib UI - [#971](https://github.com/kubeflow/katib/issues/971)
|
||||
- Enhance Katib UI visualization metrics for AutoML Experiments
|
||||
- Improve Katib Config UX - [#2150](https://github.com/kubeflow/katib/issues/2150)
|
||||
|
||||
## Integration with Kubeflow Components
|
||||
|
||||
- Kubeflow Pipeline as a Katib Trial target - [#1914](https://github.com/kubeflow/katib/issues/1914)
|
||||
- Improve data passing when Katib Experiment is part of Kubeflow Pipeline - [#1846](https://github.com/kubeflow/katib/issues/1846)
|
||||
|
||||
# History
|
||||
|
||||
# Katib 2021 Roadmap
|
||||
|
||||
## New Features
|
||||
|
@ -66,6 +24,8 @@
|
|||
- Support multiple CRD version with conversion webhook
|
||||
- MLMD integration with Katib Experiments
|
||||
|
||||
# History
|
||||
|
||||
# Katib 2020 Roadmap
|
||||
|
||||
## New Features
|
||||
|
|
64
SECURITY.md
64
SECURITY.md
|
@ -1,64 +0,0 @@
|
|||
# Security Policy
|
||||
|
||||
## Supported Versions
|
||||
|
||||
Kubeflow Katib versions are expressed as `vX.Y.Z`, where X is the major version,
|
||||
Y is the minor version, and Z is the patch version, following the
|
||||
[Semantic Versioning](https://semver.org/) terminology.
|
||||
|
||||
The Kubeflow Katib project maintains release branches for the most recent two minor releases.
|
||||
Applicable fixes, including security fixes, may be backported to those two release branches,
|
||||
depending on severity and feasibility.
|
||||
|
||||
Users are encouraged to stay updated with the latest releases to benefit from security patches and
|
||||
improvements.
|
||||
|
||||
## Reporting a Vulnerability
|
||||
|
||||
We're extremely grateful for security researchers and users that report vulnerabilities to the
|
||||
Kubeflow Open Source Community. All reports are thoroughly investigated by Kubeflow projects owners.
|
||||
|
||||
You can use the following ways to report security vulnerabilities privately:
|
||||
|
||||
- Using the Kubeflow Katib repository [GitHub Security Advisory](https://github.com/kubeflow/katib/security/advisories/new).
|
||||
- Using our private Kubeflow Steering Committee mailing list: ksc@kubeflow.org.
|
||||
|
||||
Please provide detailed information to help us understand and address the issue promptly.
|
||||
|
||||
## Disclosure Process
|
||||
|
||||
**Acknowledgment**: We will acknowledge receipt of your report within 10 business days.
|
||||
|
||||
**Assessment**: The Kubeflow projects owners will investigate the reported issue to determine its
|
||||
validity and severity.
|
||||
|
||||
**Resolution**: If the issue is confirmed, we will work on a fix and prepare a release.
|
||||
|
||||
**Notification**: Once a fix is available, we will notify the reporter and coordinate a public
|
||||
disclosure.
|
||||
|
||||
**Public Disclosure**: Details of the vulnerability and the fix will be published in the project's
|
||||
release notes and communicated through appropriate channels.
|
||||
|
||||
## Prevention Mechanisms
|
||||
|
||||
Kubeflow Katib employs several measures to prevent security issues:
|
||||
|
||||
**Code Reviews**: All code changes are reviewed by maintainers to ensure code quality and security.
|
||||
|
||||
**Dependency Management**: Regular updates and monitoring of dependencies (e.g. Dependabot) to
|
||||
address known vulnerabilities.
|
||||
|
||||
**Continuous Integration**: Automated testing and security checks are integrated into the CI/CD pipeline.
|
||||
|
||||
**Image Scanning**: Container images are scanned for vulnerabilities.
|
||||
|
||||
## Communication Channels
|
||||
|
||||
For the general questions please join the following resources:
|
||||
|
||||
- Kubeflow [Slack channels](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels).
|
||||
|
||||
- Kubeflow discuss [mailing list](https://www.kubeflow.org/docs/about/community/#kubeflow-mailing-list).
|
||||
|
||||
Please **do not report** security vulnerabilities through public channels.
|
|
@ -0,0 +1,29 @@
|
|||
# Build the Katib Cert Generatoe.
|
||||
FROM golang:alpine AS build-env
|
||||
|
||||
WORKDIR /go/src/github.com/kubeflow/katib
|
||||
|
||||
# Download packages.
|
||||
COPY go.mod .
|
||||
COPY go.sum .
|
||||
RUN go mod download -x
|
||||
|
||||
# Copy sources.
|
||||
COPY cmd/ cmd/
|
||||
COPY pkg/ pkg/
|
||||
|
||||
# Build the binary.
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o katib-cert-generator ./cmd/cert-generator/v1beta1; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o katib-cert-generator ./cmd/cert-generator/v1beta1; \
|
||||
else \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o katib-cert-generator ./cmd/cert-generator/v1beta1; \
|
||||
fi
|
||||
|
||||
# Copy the cert-generator into a thin image.
|
||||
FROM gcr.io/distroless/static:nonroot
|
||||
WORKDIR /app
|
||||
COPY --from=build-env /go/src/github.com/kubeflow/katib/katib-cert-generator /app/
|
||||
USER 65532:65532
|
||||
ENTRYPOINT ["./katib-cert-generator"]
|
|
@ -0,0 +1,42 @@
|
|||
/*
|
||||
Copyright 2022 The Kubeflow Authors.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
*/
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"github.com/kubeflow/katib/pkg/cert-generator/v1beta1"
|
||||
"k8s.io/client-go/kubernetes/scheme"
|
||||
"k8s.io/klog"
|
||||
"os"
|
||||
"sigs.k8s.io/controller-runtime/pkg/client"
|
||||
"sigs.k8s.io/controller-runtime/pkg/client/config"
|
||||
)
|
||||
|
||||
func main() {
|
||||
kubeClient, err := client.New(config.GetConfigOrDie(), client.Options{Scheme: scheme.Scheme})
|
||||
if err != nil {
|
||||
klog.Fatalf("Failed to create kube client.")
|
||||
}
|
||||
|
||||
cmd, err := v1beta1.NewKatibCertGeneratorCmd(kubeClient)
|
||||
if err != nil {
|
||||
klog.Fatalf("Failed to generate cert: %v", err)
|
||||
}
|
||||
|
||||
if err = cmd.Execute(); err != nil {
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
|
@ -1,7 +1,7 @@
|
|||
# Build the Katib DB manager.
|
||||
FROM golang:alpine AS build-env
|
||||
|
||||
ARG TARGETARCH
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
|
||||
WORKDIR /go/src/github.com/kubeflow/katib
|
||||
|
||||
|
@ -15,10 +15,28 @@ COPY cmd/ cmd/
|
|||
COPY pkg/ pkg/
|
||||
|
||||
# Build the binary.
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH="${TARGETARCH}" go build -a -o katib-db-manager ./cmd/db-manager/v1beta1
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o katib-db-manager ./cmd/db-manager/v1beta1; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o katib-db-manager ./cmd/db-manager/v1beta1; \
|
||||
else \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o katib-db-manager ./cmd/db-manager/v1beta1; \
|
||||
fi
|
||||
|
||||
# Add GRPC health probe.
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
# Copy the db-manager into a thin image.
|
||||
FROM alpine:3.15
|
||||
WORKDIR /app
|
||||
COPY --from=build-env /bin/grpc_health_probe /bin/
|
||||
COPY --from=build-env /go/src/github.com/kubeflow/katib/katib-db-manager /app/
|
||||
ENTRYPOINT ["./katib-db-manager"]
|
||||
CMD ["-w", "kubernetes"]
|
||||
|
|
|
@ -22,21 +22,19 @@ import (
|
|||
"fmt"
|
||||
"net"
|
||||
"os"
|
||||
"time"
|
||||
|
||||
health_pb "github.com/kubeflow/katib/pkg/apis/manager/health"
|
||||
api_pb "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"
|
||||
db "github.com/kubeflow/katib/pkg/db/v1beta1"
|
||||
"github.com/kubeflow/katib/pkg/db/v1beta1/common"
|
||||
"k8s.io/klog/v2"
|
||||
"k8s.io/klog"
|
||||
|
||||
"google.golang.org/grpc"
|
||||
"google.golang.org/grpc/reflection"
|
||||
)
|
||||
|
||||
const (
|
||||
defaultListenAddress = "0.0.0.0:6789"
|
||||
defaultConnectTimeout = time.Second * 60
|
||||
port = "0.0.0.0:6789"
|
||||
)
|
||||
|
||||
var dbIf common.KatibDBInterface
|
||||
|
@ -89,30 +87,25 @@ func (s *server) Check(ctx context.Context, in *health_pb.HealthCheckRequest) (*
|
|||
}
|
||||
|
||||
func main() {
|
||||
var connectTimeout time.Duration
|
||||
var listenAddress string
|
||||
flag.DurationVar(&connectTimeout, "connect-timeout", defaultConnectTimeout, "Timeout before calling error during database connection. (e.g. 120s)")
|
||||
flag.StringVar(&listenAddress, "listen-address", defaultListenAddress, "The network interface or IP address to receive incoming connections. (e.g. 0.0.0.0:6789)")
|
||||
flag.Parse()
|
||||
|
||||
var err error
|
||||
dbNameEnvName := common.DBNameEnvName
|
||||
dbName := os.Getenv(dbNameEnvName)
|
||||
if dbName == "" {
|
||||
klog.Fatal("DB_NAME env is not set. Exiting")
|
||||
}
|
||||
dbIf, err = db.NewKatibDBInterface(dbName, connectTimeout)
|
||||
dbIf, err = db.NewKatibDBInterface(dbName)
|
||||
if err != nil {
|
||||
klog.Fatalf("Failed to open db connection: %v", err)
|
||||
}
|
||||
dbIf.DBInit()
|
||||
listener, err := net.Listen("tcp", listenAddress)
|
||||
listener, err := net.Listen("tcp", port)
|
||||
if err != nil {
|
||||
klog.Fatalf("Failed to listen: %v", err)
|
||||
}
|
||||
|
||||
size := 1<<31 - 1
|
||||
klog.Infof("Start Katib manager: %s", listenAddress)
|
||||
klog.Infof("Start Katib manager: %s", port)
|
||||
s := grpc.NewServer(grpc.MaxRecvMsgSize(size), grpc.MaxSendMsgSize(size))
|
||||
api_pb.RegisterDBManagerServer(s, &server{})
|
||||
health_pb.RegisterHealthServer(s, &server{})
|
||||
|
|
|
@ -20,7 +20,7 @@ import (
|
|||
"context"
|
||||
"testing"
|
||||
|
||||
"go.uber.org/mock/gomock"
|
||||
"github.com/golang/mock/gomock"
|
||||
|
||||
health_pb "github.com/kubeflow/katib/pkg/apis/manager/health"
|
||||
api_pb "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"
|
||||
|
|
|
@ -1,11 +1,9 @@
|
|||
FROM python:3.11-slim
|
||||
FROM python:3.9-slim
|
||||
|
||||
ARG TARGETARCH
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV EARLY_STOPPING_DIR cmd/earlystopping/medianstop/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ] || [ "${TARGETARCH}" = "arm64" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get clean && \
|
||||
|
@ -14,11 +12,12 @@ RUN if [ "${TARGETARCH}" = "ppc64le" ] || [ "${TARGETARCH}" = "arm64" ]; then \
|
|||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${EARLY_STOPPING_DIR}/ ${TARGET_DIR}/${EARLY_STOPPING_DIR}/
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${EARLY_STOPPING_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,14 +12,12 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import logging
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
|
||||
import time
|
||||
import logging
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.earlystopping.v1beta1.medianstop.service import MedianStopService
|
||||
from concurrent import futures
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6788"
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
grpcio>=1.64.1
|
||||
protobuf>=4.21.12,<5
|
||||
grpcio==1.41.1
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.6.0
|
||||
kubernetes==22.6.0
|
||||
cython>=0.29.24
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
# Build the Katib controller.
|
||||
FROM golang:alpine AS build-env
|
||||
|
||||
ARG TARGETARCH
|
||||
|
||||
WORKDIR /go/src/github.com/kubeflow/katib
|
||||
|
||||
# Download packages.
|
||||
|
@ -15,7 +13,13 @@ COPY cmd/ cmd/
|
|||
COPY pkg/ pkg/
|
||||
|
||||
# Build the binary.
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build -a -o katib-controller ./cmd/katib-controller/v1beta1
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o katib-controller ./cmd/katib-controller/v1beta1; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o katib-controller ./cmd/katib-controller/v1beta1; \
|
||||
else \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o katib-controller ./cmd/katib-controller/v1beta1; \
|
||||
fi
|
||||
|
||||
# Copy the controller-manager into a thin image.
|
||||
FROM alpine:3.15
|
||||
|
|
|
@ -15,7 +15,7 @@ limitations under the License.
|
|||
*/
|
||||
|
||||
/*
|
||||
Katib-controller is a controller (operator) for Experiments and Trials
|
||||
Katib-controller is a controller (operator) for Experiments and Trials
|
||||
*/
|
||||
package main
|
||||
|
||||
|
@ -24,75 +24,64 @@ import (
|
|||
"os"
|
||||
|
||||
"github.com/spf13/viper"
|
||||
"k8s.io/apimachinery/pkg/runtime"
|
||||
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
|
||||
"sigs.k8s.io/controller-runtime/pkg/client/config"
|
||||
"sigs.k8s.io/controller-runtime/pkg/healthz"
|
||||
logf "sigs.k8s.io/controller-runtime/pkg/log"
|
||||
"sigs.k8s.io/controller-runtime/pkg/log/zap"
|
||||
"sigs.k8s.io/controller-runtime/pkg/manager"
|
||||
"sigs.k8s.io/controller-runtime/pkg/manager/signals"
|
||||
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
|
||||
"sigs.k8s.io/controller-runtime/pkg/webhook"
|
||||
|
||||
configv1beta1 "github.com/kubeflow/katib/pkg/apis/config/v1beta1"
|
||||
apis "github.com/kubeflow/katib/pkg/apis/controller"
|
||||
cert "github.com/kubeflow/katib/pkg/certgenerator/v1beta1"
|
||||
"github.com/kubeflow/katib/pkg/controller.v1beta1"
|
||||
controller "github.com/kubeflow/katib/pkg/controller.v1beta1"
|
||||
"github.com/kubeflow/katib/pkg/controller.v1beta1/consts"
|
||||
"github.com/kubeflow/katib/pkg/util/v1beta1/katibconfig"
|
||||
webhookv1beta1 "github.com/kubeflow/katib/pkg/webhook/v1beta1"
|
||||
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
|
||||
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
|
||||
trialutil "github.com/kubeflow/katib/pkg/controller.v1beta1/trial/util"
|
||||
webhook "github.com/kubeflow/katib/pkg/webhook/v1beta1"
|
||||
)
|
||||
|
||||
var (
|
||||
scheme = runtime.NewScheme()
|
||||
log = logf.Log.WithName("entrypoint")
|
||||
)
|
||||
|
||||
func init() {
|
||||
utilruntime.Must(apis.AddToScheme(scheme))
|
||||
utilruntime.Must(configv1beta1.AddToScheme(scheme))
|
||||
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
|
||||
}
|
||||
|
||||
func main() {
|
||||
logf.SetLogger(zap.New())
|
||||
log := logf.Log.WithName("entrypoint")
|
||||
|
||||
var experimentSuggestionName string
|
||||
var metricsAddr string
|
||||
var webhookPort int
|
||||
var injectSecurityContext bool
|
||||
var enableGRPCProbeInSuggestion bool
|
||||
var trialResources trialutil.GvkListFlag
|
||||
var enableLeaderElection bool
|
||||
var leaderElectionID string
|
||||
|
||||
flag.StringVar(&experimentSuggestionName, "experiment-suggestion-name",
|
||||
"default", "The implementation of suggestion interface in experiment controller (default)")
|
||||
flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
|
||||
flag.BoolVar(&injectSecurityContext, "webhook-inject-securitycontext", false, "Inject the securityContext of container[0] in the sidecar")
|
||||
flag.BoolVar(&enableGRPCProbeInSuggestion, "enable-grpc-probe-in-suggestion", true, "enable grpc probe in suggestions")
|
||||
flag.Var(&trialResources, "trial-resources", "The list of resources that can be used as trial template, in the form: Kind.version.group (e.g. TFJob.v1.kubeflow.org)")
|
||||
flag.IntVar(&webhookPort, "webhook-port", 8443, "The port number to be used for admission webhook server.")
|
||||
// For leader election
|
||||
flag.BoolVar(&enableLeaderElection, "enable-leader-election", false, "Enable leader election for katib-controller. Enabling this will ensure there is only one active katib-controller.")
|
||||
flag.StringVar(&leaderElectionID, "leader-election-id", "3fbc96e9.katib.kubeflow.org", "The ID for leader election.")
|
||||
|
||||
// TODO (andreyvelich): Currently it is not possible to set different webhook service name.
|
||||
// flag.StringVar(&serviceName, "webhook-service-name", "katib-controller", "The service name which will be used in webhook")
|
||||
// TODO (andreyvelich): Currently is is not possible to store webhook cert in the local file system.
|
||||
// flag.BoolVar(&certLocalFS, "cert-localfs", false, "Store the webhook cert in local file system")
|
||||
|
||||
var katibConfigFile string
|
||||
flag.StringVar(&katibConfigFile, "katib-config", "",
|
||||
"The katib-controller will load its initial configuration from this file. "+
|
||||
"Omit this flag to use the default configuration values. ")
|
||||
flag.Parse()
|
||||
|
||||
initConfig, err := katibconfig.GetInitConfigData(scheme, katibConfigFile)
|
||||
if err != nil {
|
||||
log.Error(err, "Failed to get KatibConfig")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Set the config in viper.
|
||||
viper.Set(consts.ConfigExperimentSuggestionName, initConfig.ControllerConfig.ExperimentSuggestionName)
|
||||
viper.Set(consts.ConfigInjectSecurityContext, initConfig.ControllerConfig.InjectSecurityContext)
|
||||
viper.Set(consts.ConfigEnableGRPCProbeInSuggestion, initConfig.ControllerConfig.EnableGRPCProbeInSuggestion)
|
||||
|
||||
trialGVKs, err := katibconfig.TrialResourcesToGVKs(initConfig.ControllerConfig.TrialResources)
|
||||
if err != nil {
|
||||
log.Error(err, "Failed to parse trialResources")
|
||||
os.Exit(1)
|
||||
}
|
||||
viper.Set(consts.ConfigTrialResources, trialGVKs)
|
||||
viper.Set(consts.ConfigExperimentSuggestionName, experimentSuggestionName)
|
||||
viper.Set(consts.ConfigInjectSecurityContext, injectSecurityContext)
|
||||
viper.Set(consts.ConfigEnableGRPCProbeInSuggestion, enableGRPCProbeInSuggestion)
|
||||
viper.Set(consts.ConfigTrialResources, trialResources)
|
||||
|
||||
log.Info("Config:",
|
||||
consts.ConfigExperimentSuggestionName,
|
||||
viper.GetString(consts.ConfigExperimentSuggestionName),
|
||||
"webhook-port",
|
||||
initConfig.ControllerConfig.WebhookPort,
|
||||
webhookPort,
|
||||
"metrics-addr",
|
||||
initConfig.ControllerConfig.MetricsAddr,
|
||||
"healthz-addr",
|
||||
initConfig.ControllerConfig.HealthzAddr,
|
||||
metricsAddr,
|
||||
consts.ConfigInjectSecurityContext,
|
||||
viper.GetBool(consts.ConfigInjectSecurityContext),
|
||||
consts.ConfigEnableGRPCProbeInSuggestion,
|
||||
|
@ -110,13 +99,9 @@ func main() {
|
|||
|
||||
// Create a new katib controller to provide shared dependencies and start components
|
||||
mgr, err := manager.New(cfg, manager.Options{
|
||||
Metrics: metricsserver.Options{
|
||||
BindAddress: initConfig.ControllerConfig.MetricsAddr,
|
||||
},
|
||||
HealthProbeBindAddress: initConfig.ControllerConfig.HealthzAddr,
|
||||
LeaderElection: initConfig.ControllerConfig.EnableLeaderElection,
|
||||
LeaderElectionID: initConfig.ControllerConfig.LeaderElectionID,
|
||||
Scheme: scheme,
|
||||
MetricsBindAddress: metricsAddr,
|
||||
LeaderElection: enableLeaderElection,
|
||||
LeaderElectionID: leaderElectionID,
|
||||
})
|
||||
if err != nil {
|
||||
log.Error(err, "Failed to create the manager")
|
||||
|
@ -125,50 +110,11 @@ func main() {
|
|||
|
||||
log.Info("Registering Components.")
|
||||
|
||||
// Create a webhook server.
|
||||
hookServer := webhook.NewServer(webhook.Options{
|
||||
Port: *initConfig.ControllerConfig.WebhookPort,
|
||||
CertDir: consts.CertDir,
|
||||
})
|
||||
|
||||
ctx := signals.SetupSignalHandler()
|
||||
certsReady := make(chan struct{})
|
||||
defer close(certsReady)
|
||||
|
||||
// The setupControllers will register controllers to the manager
|
||||
// after generated certs for the admission webhooks.
|
||||
go setupControllers(mgr, certsReady, hookServer)
|
||||
|
||||
if initConfig.CertGeneratorConfig.Enable {
|
||||
if err = cert.AddToManager(mgr, initConfig.CertGeneratorConfig, certsReady); err != nil {
|
||||
log.Error(err, "Failed to set up cert-generator")
|
||||
}
|
||||
} else {
|
||||
certsReady <- struct{}{}
|
||||
}
|
||||
|
||||
log.Info("Setting up health checker.")
|
||||
if err := mgr.AddReadyzCheck("readyz", hookServer.StartedChecker()); err != nil {
|
||||
log.Error(err, "Unable to add readyz endpoint to the manager")
|
||||
// Setup Scheme for all resources
|
||||
if err := apis.AddToScheme(mgr.GetScheme()); err != nil {
|
||||
log.Error(err, "Unable to add APIs to scheme")
|
||||
os.Exit(1)
|
||||
}
|
||||
if err = mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
|
||||
log.Error(err, "Add webhook server health checker to the manager failed")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Start the Cmd
|
||||
log.Info("Starting the manager.")
|
||||
if err = mgr.Start(ctx); err != nil {
|
||||
log.Error(err, "Unable to run the manager")
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
func setupControllers(mgr manager.Manager, certsReady chan struct{}, hookServer webhook.Server) {
|
||||
// The certsReady blocks to register controllers until generated certs.
|
||||
<-certsReady
|
||||
log.Info("Certs ready")
|
||||
|
||||
// Setup all Controllers
|
||||
log.Info("Setting up controller.")
|
||||
|
@ -178,8 +124,15 @@ func setupControllers(mgr manager.Manager, certsReady chan struct{}, hookServer
|
|||
}
|
||||
|
||||
log.Info("Setting up webhooks.")
|
||||
if err := webhookv1beta1.AddToManager(mgr, hookServer); err != nil {
|
||||
if err := webhook.AddToManager(mgr, webhookPort); err != nil {
|
||||
log.Error(err, "Unable to register webhooks to the manager")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Start the Cmd
|
||||
log.Info("Starting the Cmd.")
|
||||
if err := mgr.Start(signals.SetupSignalHandler()); err != nil {
|
||||
log.Error(err, "Unable to run the manager")
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
# Build the Katib file metrics collector.
|
||||
FROM golang:alpine AS build-env
|
||||
|
||||
ARG TARGETARCH
|
||||
|
||||
WORKDIR /go/src/github.com/kubeflow/katib
|
||||
|
||||
# Download packages.
|
||||
|
@ -15,7 +13,13 @@ COPY cmd/ cmd/
|
|||
COPY pkg/ pkg/
|
||||
|
||||
# Build the binary.
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build -a -o file-metricscollector ./cmd/metricscollector/v1beta1/file-metricscollector
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o file-metricscollector ./cmd/metricscollector/v1beta1/file-metricscollector; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o file-metricscollector ./cmd/metricscollector/v1beta1/file-metricscollector; \
|
||||
else \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o file-metricscollector ./cmd/metricscollector/v1beta1/file-metricscollector; \
|
||||
fi
|
||||
|
||||
# Copy the file metrics collector into a thin image.
|
||||
FROM alpine:3.15
|
||||
|
|
|
@ -42,6 +42,7 @@ import (
|
|||
"encoding/json"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"regexp"
|
||||
|
@ -49,11 +50,11 @@ import (
|
|||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/nxadm/tail"
|
||||
"github.com/hpcloud/tail"
|
||||
psutil "github.com/shirou/gopsutil/v3/process"
|
||||
"google.golang.org/grpc"
|
||||
"google.golang.org/grpc/credentials/insecure"
|
||||
"k8s.io/klog/v2"
|
||||
"k8s.io/klog"
|
||||
|
||||
commonv1beta1 "github.com/kubeflow/katib/pkg/apis/controller/common/v1beta1"
|
||||
api "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"
|
||||
|
@ -134,11 +135,7 @@ func printMetricsFile(mFile string) {
|
|||
checkMetricFile(mFile)
|
||||
|
||||
// Print lines from metrics file.
|
||||
t, err := tail.TailFile(mFile, tail.Config{Follow: true, ReOpen: true})
|
||||
if err != nil {
|
||||
klog.Errorf("Failed to open metrics file: %v", err)
|
||||
}
|
||||
|
||||
t, _ := tail.TailFile(mFile, tail.Config{Follow: true})
|
||||
for line := range t.Lines {
|
||||
klog.Info(line.Text)
|
||||
}
|
||||
|
@ -164,9 +161,7 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string, f
|
|||
checkMetricFile(mFile)
|
||||
|
||||
// Get Main process.
|
||||
// Extract the metric file dir path based on the file name.
|
||||
mDirPath, _ := filepath.Split(mFile)
|
||||
_, mainProcPid, err := common.GetMainProcesses(mDirPath)
|
||||
_, mainProcPid, err := common.GetMainProcesses(mFile)
|
||||
if err != nil {
|
||||
klog.Fatalf("GetMainProcesses failed: %v", err)
|
||||
}
|
||||
|
@ -273,7 +268,7 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string, f
|
|||
klog.Fatalf("Create mark file %v error: %v", markFile, err)
|
||||
}
|
||||
|
||||
err = os.WriteFile(markFile, []byte(common.TrainingEarlyStopped), 0)
|
||||
err = ioutil.WriteFile(markFile, []byte(common.TrainingEarlyStopped), 0)
|
||||
if err != nil {
|
||||
klog.Fatalf("Write to file %v error: %v", markFile, err)
|
||||
}
|
||||
|
@ -311,7 +306,7 @@ func watchMetricsFile(mFile string, stopRules stopRulesFlag, filters []string, f
|
|||
}
|
||||
|
||||
// Create connection and client for Early Stopping service.
|
||||
conn, err := grpc.NewClient(*earlyStopServiceAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
|
||||
conn, err := grpc.Dial(*earlyStopServiceAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
|
||||
if err != nil {
|
||||
klog.Fatalf("Could not connect to Early Stopping service, error: %v", err)
|
||||
}
|
||||
|
@ -433,7 +428,7 @@ func main() {
|
|||
|
||||
func reportMetrics(filters []string, fileFormat commonv1beta1.FileFormat) {
|
||||
|
||||
conn, err := grpc.NewClient(*dbManagerServiceAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
|
||||
conn, err := grpc.Dial(*dbManagerServiceAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
|
||||
if err != nil {
|
||||
klog.Fatalf("Could not connect to DB manager service, error: %v", err)
|
||||
}
|
||||
|
|
|
@ -1,24 +1,24 @@
|
|||
FROM python:3.11-slim
|
||||
FROM python:3.9-slim
|
||||
|
||||
ARG TARGETARCH
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV METRICS_COLLECTOR_DIR cmd/metricscollector/v1beta1/tfevent-metricscollector
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/metricscollector/v1beta1/tfevent-metricscollector/::${TARGET_DIR}/pkg/metricscollector/v1beta1/common/
|
||||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${METRICS_COLLECTOR_DIR}/ ${TARGET_DIR}/${METRICS_COLLECTOR_DIR}/
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${METRICS_COLLECTOR_DIR}
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "arm64" ]; then \
|
||||
RUN if [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libpcre3 libpcre3-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*; \
|
||||
fi
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/metricscollector/v1beta1/tfevent-metricscollector/::${TARGET_DIR}/pkg/metricscollector/v1beta1/common/
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
FROM ibmcom/tensorflow-ppc64le:2.2.0-py3
|
||||
ADD . /usr/src/app/github.com/kubeflow/katib
|
||||
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/metricscollector/v1beta1/tfevent-metricscollector/
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib:/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1beta1/python:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/tfevent-metricscollector/:/usr/src/app/github.com/kubeflow/katib/pkg/metricscollector/v1beta1/common/
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,15 +12,13 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import argparse
|
||||
from logging import INFO, StreamHandler, getLogger
|
||||
|
||||
import api_pb2
|
||||
import api_pb2_grpc
|
||||
import const
|
||||
import grpc
|
||||
import argparse
|
||||
import api_pb2
|
||||
from pns import WaitMainProcesses
|
||||
import const
|
||||
from tfevent_loader import MetricsCollector
|
||||
from logging import getLogger, StreamHandler, INFO
|
||||
|
||||
timeout_in_seconds = 60
|
||||
|
||||
|
@ -57,28 +55,25 @@ if __name__ == '__main__':
|
|||
wait_all_processes = opt.wait_all_processes.lower() == "true"
|
||||
db_manager_server = opt.db_manager_server_addr.split(':')
|
||||
if len(db_manager_server) != 2:
|
||||
raise Exception(
|
||||
f"Invalid Katib DB manager service address: {opt.db_manager_server_addr}"
|
||||
)
|
||||
raise Exception("Invalid Katib DB manager service address: %s" %
|
||||
opt.db_manager_server_addr)
|
||||
|
||||
WaitMainProcesses(
|
||||
pool_interval=opt.poll_interval,
|
||||
timout=opt.timeout,
|
||||
wait_all=wait_all_processes,
|
||||
completed_marked_dir=opt.metrics_file_dir,
|
||||
)
|
||||
completed_marked_dir=opt.metrics_file_dir)
|
||||
|
||||
mc = MetricsCollector(opt.metric_names.split(";"))
|
||||
mc = MetricsCollector(opt.metric_names.split(';'))
|
||||
observation_log = mc.parse_file(opt.metrics_file_dir)
|
||||
|
||||
with grpc.insecure_channel(opt.db_manager_server_addr) as channel:
|
||||
stub = api_pb2_grpc.DBManagerStub(channel)
|
||||
logger.info(
|
||||
f"In {opt.trial_name} {str(len(observation_log.metric_logs))} metrics will be reported."
|
||||
)
|
||||
stub.ReportObservationLog(
|
||||
api_pb2.ReportObservationLogRequest(
|
||||
trial_name=opt.trial_name, observation_log=observation_log
|
||||
),
|
||||
timeout=timeout_in_seconds,
|
||||
)
|
||||
channel = grpc.beta.implementations.insecure_channel(
|
||||
db_manager_server[0], int(db_manager_server[1]))
|
||||
|
||||
with api_pb2.beta_create_DBManager_stub(channel) as client:
|
||||
logger.info("In " + opt.trial_name + " " +
|
||||
str(len(observation_log.metric_logs)) + " metrics will be reported.")
|
||||
client.ReportObservationLog(api_pb2.ReportObservationLogRequest(
|
||||
trial_name=opt.trial_name,
|
||||
observation_log=observation_log
|
||||
), timeout=timeout_in_seconds)
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
psutil==5.9.4
|
||||
psutil==5.8.0
|
||||
rfc3339>=6.2
|
||||
grpcio>=1.64.1
|
||||
grpcio==1.41.1
|
||||
googleapis-common-protos==1.6.0
|
||||
tensorflow==2.16.1
|
||||
protobuf>=4.21.12,<5
|
||||
tensorflow==2.9.1; platform_machine=="x86_64"
|
||||
tensorflow-aarch64==2.9.1; platform_machine=="aarch64"
|
||||
|
|
|
@ -0,0 +1,63 @@
|
|||
# --- Clone the kubeflow/kubeflow code ---
|
||||
FROM ubuntu AS fetch-kubeflow-kubeflow
|
||||
|
||||
RUN apt-get update && apt-get install git -y
|
||||
|
||||
WORKDIR /kf
|
||||
RUN git clone https://github.com/kubeflow/kubeflow.git && \
|
||||
cd kubeflow && \
|
||||
git checkout ecb72c2
|
||||
|
||||
# --- Build the frontend kubeflow library ---
|
||||
FROM node:12 AS frontend-kubeflow-lib
|
||||
|
||||
WORKDIR /src
|
||||
|
||||
ARG LIB=/kf/kubeflow/components/crud-web-apps/common/frontend/kubeflow-common-lib
|
||||
COPY --from=fetch-kubeflow-kubeflow $LIB/package*.json ./
|
||||
RUN npm ci
|
||||
|
||||
COPY --from=fetch-kubeflow-kubeflow $LIB/ ./
|
||||
RUN npm run build
|
||||
|
||||
# --- Build the frontend ---
|
||||
FROM node:12 AS frontend
|
||||
|
||||
WORKDIR /src
|
||||
COPY ./pkg/new-ui/v1beta1/frontend/package*.json ./
|
||||
RUN npm ci
|
||||
|
||||
COPY ./pkg/new-ui/v1beta1/frontend/ .
|
||||
COPY --from=frontend-kubeflow-lib /src/dist/kubeflow/ ./node_modules/kubeflow/
|
||||
|
||||
RUN npm run build:prod
|
||||
|
||||
# --- Build the backend ---
|
||||
FROM golang:alpine AS go-build
|
||||
|
||||
WORKDIR /go/src/github.com/kubeflow/katib
|
||||
|
||||
# Download packages.
|
||||
COPY go.mod .
|
||||
COPY go.sum .
|
||||
RUN go mod download -x
|
||||
|
||||
# Copy sources.
|
||||
COPY cmd/ cmd/
|
||||
COPY pkg/ pkg/
|
||||
|
||||
# Build the binary.
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o katib-ui ./cmd/new-ui/v1beta1; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o katib-ui ./cmd/new-ui/v1beta1; \
|
||||
else \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o katib-ui ./cmd/new-ui/v1beta1; \
|
||||
fi
|
||||
|
||||
# --- Compose the web app ---
|
||||
FROM alpine:3.15
|
||||
WORKDIR /app
|
||||
COPY --from=go-build /go/src/github.com/kubeflow/katib/katib-ui /app/
|
||||
COPY --from=frontend /src/dist/static /app/build/static/
|
||||
ENTRYPOINT ["./katib-ui"]
|
|
@ -0,0 +1,75 @@
|
|||
/*
|
||||
Copyright 2022 The Kubeflow Authors.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
*/
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
|
||||
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
|
||||
|
||||
common_v1beta1 "github.com/kubeflow/katib/pkg/common/v1beta1"
|
||||
ui "github.com/kubeflow/katib/pkg/new-ui/v1beta1"
|
||||
)
|
||||
|
||||
var (
|
||||
port, host, buildDir, dbManagerAddr *string
|
||||
)
|
||||
|
||||
func init() {
|
||||
port = flag.String("port", "8080", "The port to listen to for incoming HTTP connections")
|
||||
host = flag.String("host", "0.0.0.0", "The host to listen to for incoming HTTP connections")
|
||||
buildDir = flag.String("build-dir", "/app/build", "The dir of frontend")
|
||||
dbManagerAddr = flag.String("db-manager-address", common_v1beta1.GetDBManagerAddr(), "The address of Katib DB manager")
|
||||
}
|
||||
|
||||
func main() {
|
||||
flag.Parse()
|
||||
kuh := ui.NewKatibUIHandler(*dbManagerAddr)
|
||||
|
||||
log.Printf("Serving the frontend dir %s", *buildDir)
|
||||
frontend := http.FileServer(http.Dir(*buildDir))
|
||||
http.HandleFunc("/katib/", kuh.ServeIndex(*buildDir))
|
||||
http.Handle("/katib/static/", http.StripPrefix("/katib/", frontend))
|
||||
|
||||
http.HandleFunc("/katib/fetch_experiments/", kuh.FetchAllExperiments)
|
||||
|
||||
http.HandleFunc("/katib/create_experiment/", kuh.CreateExperiment)
|
||||
|
||||
http.HandleFunc("/katib/delete_experiment/", kuh.DeleteExperiment)
|
||||
|
||||
http.HandleFunc("/katib/fetch_experiment/", kuh.FetchExperiment)
|
||||
http.HandleFunc("/katib/fetch_trial/", kuh.FetchTrial)
|
||||
http.HandleFunc("/katib/fetch_suggestion/", kuh.FetchSuggestion)
|
||||
|
||||
http.HandleFunc("/katib/fetch_hp_job_info/", kuh.FetchHPJobInfo)
|
||||
http.HandleFunc("/katib/fetch_hp_job_trial_info/", kuh.FetchHPJobTrialInfo)
|
||||
http.HandleFunc("/katib/fetch_nas_job_info/", kuh.FetchNASJobInfo)
|
||||
|
||||
http.HandleFunc("/katib/fetch_trial_templates/", kuh.FetchTrialTemplates)
|
||||
http.HandleFunc("/katib/add_template/", kuh.AddTemplate)
|
||||
http.HandleFunc("/katib/edit_template/", kuh.EditTemplate)
|
||||
http.HandleFunc("/katib/delete_template/", kuh.DeleteTemplate)
|
||||
http.HandleFunc("/katib/fetch_namespaces", kuh.FetchNamespaces)
|
||||
|
||||
log.Printf("Serving at %s:%s", *host, *port)
|
||||
if err := http.ListenAndServe(fmt.Sprintf("%s:%s", *host, *port), nil); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
|
@ -0,0 +1,36 @@
|
|||
FROM alpine:3.15 AS downloader
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
FROM python:3.9-slim
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/chocolate/v1beta1
|
||||
|
||||
RUN apt-get -y update && \
|
||||
apt-get -y install git && \
|
||||
if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev g++; \
|
||||
fi && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
COPY --from=downloader /bin/grpc_health_probe /bin/grpc_health_probe
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
|
@ -0,0 +1,42 @@
|
|||
# Copyright 2022 The Kubeflow Authors.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import grpc
|
||||
import time
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.chocolate.service import ChocolateService
|
||||
from concurrent import futures
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
||||
|
||||
def serve():
|
||||
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
|
||||
service = ChocolateService()
|
||||
api_pb2_grpc.add_SuggestionServicer_to_server(service, server)
|
||||
health_pb2_grpc.add_HealthServicer_to_server(service, server)
|
||||
server.add_insecure_port(DEFAULT_PORT)
|
||||
print("Listening...")
|
||||
server.start()
|
||||
try:
|
||||
while True:
|
||||
time.sleep(_ONE_DAY_IN_SECONDS)
|
||||
except KeyboardInterrupt:
|
||||
server.stop(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
serve()
|
|
@ -0,0 +1,13 @@
|
|||
grpcio==1.41.1
|
||||
cloudpickle==0.5.6
|
||||
numpy>=1.20.0
|
||||
scikit-learn>=0.24.0
|
||||
scipy>=1.5.4
|
||||
forestci==0.3
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.6.0
|
||||
SQLAlchemy==1.4.26
|
||||
git+https://github.com/AIworx-Labs/chocolate@master
|
||||
ghalton>=0.6.2; platform_machine=="x86_64"
|
||||
git+https://github.com/fmder/ghalton@master; platform_machine=="aarch64"
|
||||
cython>=0.29.24
|
|
@ -1,7 +1,7 @@
|
|||
# Build the Goptuna Suggestion.
|
||||
FROM golang:alpine AS build-env
|
||||
|
||||
ARG TARGETARCH
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
|
||||
WORKDIR /go/src/github.com/kubeflow/katib
|
||||
|
||||
|
@ -15,7 +15,23 @@ COPY cmd/ cmd/
|
|||
COPY pkg/ pkg/
|
||||
|
||||
# Build the binary.
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build -a -o goptuna-suggestion ./cmd/suggestion/goptuna/v1beta1
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o goptuna-suggestion ./cmd/suggestion/goptuna/v1beta1; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o goptuna-suggestion ./cmd/suggestion/goptuna/v1beta1; \
|
||||
else \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o goptuna-suggestion ./cmd/suggestion/goptuna/v1beta1; \
|
||||
fi
|
||||
|
||||
# Add GRPC health probe.
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
# Copy the Goptuna suggestion into a thin image.
|
||||
FROM alpine:3.15
|
||||
|
@ -23,7 +39,7 @@ FROM alpine:3.15
|
|||
ENV TARGET_DIR /opt/katib
|
||||
|
||||
WORKDIR ${TARGET_DIR}
|
||||
|
||||
COPY --from=build-env /bin/grpc_health_probe /bin/
|
||||
COPY --from=build-env /go/src/github.com/kubeflow/katib/goptuna-suggestion ${TARGET_DIR}/
|
||||
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
|
|
|
@ -24,7 +24,7 @@ import (
|
|||
api_v1_beta1 "github.com/kubeflow/katib/pkg/apis/manager/v1beta1"
|
||||
suggestion "github.com/kubeflow/katib/pkg/suggestion/v1beta1/goptuna"
|
||||
"google.golang.org/grpc"
|
||||
"k8s.io/klog/v2"
|
||||
"k8s.io/klog"
|
||||
)
|
||||
|
||||
const (
|
||||
|
|
|
@ -1,11 +1,19 @@
|
|||
FROM python:3.11-slim
|
||||
FROM alpine:3.15 AS downloader
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
ARG TARGETARCH
|
||||
FROM python:3.9-slim
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/hyperband/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ] || [ "${TARGETARCH}" = "arm64" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get clean && \
|
||||
|
@ -14,11 +22,14 @@ RUN if [ "${TARGETARCH}" = "ppc64le" ] || [ "${TARGETARCH}" = "arm64" ]; then \
|
|||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
COPY --from=downloader /bin/grpc_health_probe /bin/grpc_health_probe
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,14 +12,12 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
import time
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.hyperband.service import HyperbandService
|
||||
from concurrent import futures
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
grpcio>=1.64.1
|
||||
grpcio==1.41.1
|
||||
cloudpickle==0.5.6
|
||||
numpy>=1.25.2
|
||||
numpy>=1.20.0
|
||||
scikit-learn>=0.24.0
|
||||
scipy>=1.5.4
|
||||
forestci==0.3
|
||||
protobuf>=4.21.12,<5
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.6.0
|
||||
cython>=0.29.24
|
||||
|
|
|
@ -1,11 +1,19 @@
|
|||
FROM python:3.11-slim
|
||||
FROM alpine:3.15 AS downloader
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
ARG TARGETARCH
|
||||
FROM python:3.9-slim
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/hyperopt/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get clean && \
|
||||
|
@ -14,11 +22,14 @@ RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
|||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
COPY --from=downloader /bin/grpc_health_probe /bin/grpc_health_probe
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,14 +12,12 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
import time
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.hyperopt.service import HyperoptService
|
||||
from concurrent import futures
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
|
|
@ -1,10 +1,10 @@
|
|||
grpcio>=1.64.1
|
||||
grpcio==1.41.1
|
||||
cloudpickle==0.5.6
|
||||
numpy>=1.25.2
|
||||
numpy>=1.20.0
|
||||
scikit-learn>=0.24.0
|
||||
scipy>=1.5.4
|
||||
forestci==0.3
|
||||
protobuf>=4.21.12,<5
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.6.0
|
||||
hyperopt==0.2.5
|
||||
cython>=0.29.24
|
||||
|
|
|
@ -1,11 +1,19 @@
|
|||
FROM python:3.11-slim
|
||||
FROM alpine:3.15 as downloader
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
ARG TARGETARCH
|
||||
FROM python:3.9-slim
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/nas/darts/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get clean && \
|
||||
|
@ -14,11 +22,14 @@ RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
|||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
COPY --from=downloader /bin/grpc_health_probe /bin/grpc_health_probe
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,15 +12,14 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from concurrent import futures
|
||||
import time
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.nas.darts.service import DartsService
|
||||
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
grpcio>=1.64.1
|
||||
protobuf>=4.21.12,<5
|
||||
grpcio==1.41.1
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.6.0
|
||||
cython>=0.29.24
|
||||
|
|
|
@ -1,11 +1,20 @@
|
|||
FROM python:3.11-slim
|
||||
FROM alpine:3.15 AS downloader
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
ARG TARGETARCH
|
||||
FROM python:3.9-slim
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/nas/enas/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get clean && \
|
||||
|
@ -14,11 +23,14 @@ RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
|||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
COPY --from=downloader /bin/grpc_health_probe /bin/grpc_health_probe
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,15 +12,15 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
from concurrent import futures
|
||||
import time
|
||||
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.nas.enas.service import EnasService
|
||||
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
grpcio>=1.64.1
|
||||
grpcio==1.41.1
|
||||
googleapis-common-protos==1.6.0
|
||||
cython>=0.29.24
|
||||
tensorflow==2.16.1
|
||||
protobuf>=4.21.12,<5
|
||||
tensorflow==2.9.1; platform_machine=="x86_64"
|
||||
tensorflow-aarch64==2.9.1; platform_machine=="aarch64"
|
||||
|
|
|
@ -1,24 +1,34 @@
|
|||
FROM python:3.11-slim
|
||||
FROM alpine:3.15 AS downloader
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
ARG TARGETARCH
|
||||
FROM python:3.9-slim
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/optuna/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*; \
|
||||
fi
|
||||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
COPY --from=downloader /bin/grpc_health_probe /bin/grpc_health_probe
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,14 +12,12 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
import time
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.optuna.service import OptunaService
|
||||
from concurrent import futures
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
grpcio>=1.64.1
|
||||
protobuf>=4.21.12,<5
|
||||
grpcio==1.41.1
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.53.0
|
||||
optuna==3.3.0
|
||||
optuna<3.0.0
|
||||
|
|
|
@ -1,24 +1,37 @@
|
|||
FROM python:3.11-slim
|
||||
FROM python:3.9-slim
|
||||
|
||||
ARG TARGETARCH
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/pbt/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.6
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev wget && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*; \
|
||||
else \
|
||||
apt-get -y update && \
|
||||
apt-get -y install wget && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*; \
|
||||
fi
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,14 +12,12 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
import time
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.pbt.service import PbtService
|
||||
from concurrent import futures
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
grpcio>=1.64.1
|
||||
protobuf>=4.21.12,<5
|
||||
grpcio==1.41.1
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.53.0
|
||||
numpy==1.25.2
|
||||
numpy==1.22.2
|
||||
|
|
|
@ -1,24 +1,34 @@
|
|||
FROM python:3.10-slim
|
||||
FROM alpine:3.15 AS downloader
|
||||
ENV GRPC_HEALTH_PROBE_VERSION v0.4.11
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-ppc64le; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64; \
|
||||
else \
|
||||
wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64; \
|
||||
fi && \
|
||||
chmod +x /bin/grpc_health_probe
|
||||
|
||||
ARG TARGETARCH
|
||||
FROM python:3.9-slim
|
||||
ENV TARGET_DIR /opt/katib
|
||||
ENV SUGGESTION_DIR cmd/suggestion/skopt/v1beta1
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
RUN if [ "${TARGETARCH}" = "ppc64le" ]; then \
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
|
||||
apt-get -y update && \
|
||||
apt-get -y install gfortran libopenblas-dev liblapack-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*; \
|
||||
fi
|
||||
|
||||
ADD ./pkg/ ${TARGET_DIR}/pkg/
|
||||
ADD ./${SUGGESTION_DIR}/ ${TARGET_DIR}/${SUGGESTION_DIR}/
|
||||
COPY --from=downloader /bin/grpc_health_probe /bin/grpc_health_probe
|
||||
|
||||
WORKDIR ${TARGET_DIR}/${SUGGESTION_DIR}
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
RUN pip install --prefer-binary --no-cache-dir -r requirements.txt
|
||||
RUN chgrp -R 0 ${TARGET_DIR} \
|
||||
&& chmod -R g+rwX ${TARGET_DIR}
|
||||
|
||||
ENV PYTHONPATH ${TARGET_DIR}:${TARGET_DIR}/pkg/apis/manager/v1beta1/python:${TARGET_DIR}/pkg/apis/manager/health/python
|
||||
|
||||
ENTRYPOINT ["python", "main.py"]
|
||||
|
|
|
@ -12,14 +12,12 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
from concurrent import futures
|
||||
|
||||
import grpc
|
||||
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
import time
|
||||
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
|
||||
from pkg.apis.manager.health.python import health_pb2_grpc
|
||||
from pkg.suggestion.v1beta1.skopt.service import SkoptService
|
||||
from concurrent import futures
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
DEFAULT_PORT = "0.0.0.0:6789"
|
||||
|
|
|
@ -1,13 +1,10 @@
|
|||
grpcio>=1.64.1
|
||||
grpcio==1.41.1
|
||||
cloudpickle==0.5.6
|
||||
# This is a workaround to avoid the following error.
|
||||
# AttributeError: module 'numpy' has no attribute 'int'
|
||||
# See more: https://github.com/numpy/numpy/pull/22607
|
||||
numpy==1.23.5
|
||||
scikit-learn>=0.24.0, <=1.3.0
|
||||
numpy>=1.20.0
|
||||
scikit-learn>=0.24.0
|
||||
scipy>=1.5.4
|
||||
forestci==0.3
|
||||
protobuf>=4.21.12,<5
|
||||
protobuf==3.19.1
|
||||
googleapis-common-protos==1.6.0
|
||||
scikit-optimize>=0.9.0
|
||||
cython>=0.29.24
|
||||
|
|
|
@ -1,56 +1,15 @@
|
|||
# --- Clone the kubeflow/kubeflow code ---
|
||||
FROM alpine/git AS fetch-kubeflow-kubeflow
|
||||
# Build the Katib UI.
|
||||
FROM node:12.18.1 AS npm-build
|
||||
|
||||
WORKDIR /kf
|
||||
COPY ./pkg/ui/v1beta1/frontend/COMMIT ./
|
||||
RUN git clone https://github.com/kubeflow/kubeflow.git && \
|
||||
COMMIT=$(cat ./COMMIT) && \
|
||||
cd kubeflow && \
|
||||
git checkout $COMMIT
|
||||
# Build frontend.
|
||||
ADD /pkg/ui/v1beta1/frontend /frontend
|
||||
RUN cd /frontend && npm ci
|
||||
RUN cd /frontend && npm run build
|
||||
RUN rm -rf /frontend/node_modules
|
||||
|
||||
# --- Build the frontend kubeflow library ---
|
||||
FROM node:16-alpine AS frontend-kubeflow-lib
|
||||
|
||||
WORKDIR /src
|
||||
|
||||
ARG LIB=/kf/kubeflow/components/crud-web-apps/common/frontend/kubeflow-common-lib
|
||||
COPY --from=fetch-kubeflow-kubeflow $LIB/package*.json ./
|
||||
RUN npm config set fetch-retry-mintimeout 200000 && \
|
||||
npm config set fetch-retry-maxtimeout 1200000 && \
|
||||
npm config get registry && \
|
||||
npm config set registry https://registry.npmjs.org/ && \
|
||||
npm config delete https-proxy && \
|
||||
npm config set loglevel verbose && \
|
||||
npm cache clean --force && \
|
||||
npm ci --force --prefer-offline --no-audit
|
||||
|
||||
COPY --from=fetch-kubeflow-kubeflow $LIB/ ./
|
||||
RUN npm run build
|
||||
|
||||
# --- Build the frontend ---
|
||||
FROM node:16-alpine AS frontend
|
||||
|
||||
WORKDIR /src
|
||||
COPY ./pkg/ui/v1beta1/frontend/package*.json ./
|
||||
RUN npm config set fetch-retry-mintimeout 200000 && \
|
||||
npm config set fetch-retry-maxtimeout 1200000 && \
|
||||
npm config get registry && \
|
||||
npm config set registry https://registry.npmjs.org/ && \
|
||||
npm config delete https-proxy && \
|
||||
npm config set loglevel verbose && \
|
||||
npm cache clean --force && \
|
||||
npm ci --force --prefer-offline --no-audit
|
||||
|
||||
COPY ./pkg/ui/v1beta1/frontend/ .
|
||||
COPY --from=frontend-kubeflow-lib /src/dist/kubeflow/ ./node_modules/kubeflow/
|
||||
|
||||
RUN npm run build:prod
|
||||
|
||||
# --- Build the backend ---
|
||||
# Build backend.
|
||||
FROM golang:alpine AS go-build
|
||||
|
||||
ARG TARGETARCH
|
||||
|
||||
WORKDIR /go/src/github.com/kubeflow/katib
|
||||
|
||||
# Download packages.
|
||||
|
@ -63,11 +22,17 @@ COPY cmd/ cmd/
|
|||
COPY pkg/ pkg/
|
||||
|
||||
# Build the binary.
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build -a -o katib-ui ./cmd/ui/v1beta1
|
||||
RUN if [ "$(uname -m)" = "ppc64le" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le go build -a -o katib-ui ./cmd/ui/v1beta1; \
|
||||
elif [ "$(uname -m)" = "aarch64" ]; then \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -o katib-ui ./cmd/ui/v1beta1; \
|
||||
else \
|
||||
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o katib-ui ./cmd/ui/v1beta1; \
|
||||
fi
|
||||
|
||||
# --- Compose the web app ---
|
||||
# Copy the backend and frontend into a thin image.
|
||||
FROM alpine:3.15
|
||||
WORKDIR /app
|
||||
COPY --from=go-build /go/src/github.com/kubeflow/katib/katib-ui /app/
|
||||
COPY --from=frontend /src/dist/static /app/build/static/
|
||||
COPY --from=npm-build /frontend/build /app/build
|
||||
ENTRYPOINT ["./katib-ui"]
|
||||
|
|
|
@ -33,7 +33,7 @@ var (
|
|||
)
|
||||
|
||||
func init() {
|
||||
port = flag.String("port", "8080", "The port to listen to for incoming HTTP connections")
|
||||
port = flag.String("port", "80", "The port to listen to for incoming HTTP connections")
|
||||
host = flag.String("host", "0.0.0.0", "The host to listen to for incoming HTTP connections")
|
||||
buildDir = flag.String("build-dir", "/app/build", "The dir of frontend")
|
||||
dbManagerAddr = flag.String("db-manager-address", common_v1beta1.GetDBManagerAddr(), "The address of Katib DB manager")
|
||||
|
@ -45,17 +45,17 @@ func main() {
|
|||
|
||||
log.Printf("Serving the frontend dir %s", *buildDir)
|
||||
frontend := http.FileServer(http.Dir(*buildDir))
|
||||
http.HandleFunc("/katib/", kuh.ServeIndex(*buildDir))
|
||||
http.Handle("/katib/static/", http.StripPrefix("/katib/", frontend))
|
||||
http.Handle("/katib/", http.StripPrefix("/katib/", frontend))
|
||||
|
||||
http.HandleFunc("/katib/fetch_experiments/", kuh.FetchExperiments)
|
||||
http.HandleFunc("/katib/fetch_experiments/", kuh.FetchAllExperiments)
|
||||
|
||||
http.HandleFunc("/katib/create_experiment/", kuh.CreateExperiment)
|
||||
http.HandleFunc("/katib/submit_yaml/", kuh.SubmitYamlJob)
|
||||
http.HandleFunc("/katib/submit_hp_job/", kuh.SubmitParamsJob)
|
||||
http.HandleFunc("/katib/submit_nas_job/", kuh.SubmitParamsJob)
|
||||
|
||||
http.HandleFunc("/katib/delete_experiment/", kuh.DeleteExperiment)
|
||||
|
||||
http.HandleFunc("/katib/fetch_experiment/", kuh.FetchExperiment)
|
||||
http.HandleFunc("/katib/fetch_trial/", kuh.FetchTrial)
|
||||
http.HandleFunc("/katib/fetch_suggestion/", kuh.FetchSuggestion)
|
||||
|
||||
http.HandleFunc("/katib/fetch_hp_job_info/", kuh.FetchHPJobInfo)
|
||||
|
@ -67,7 +67,6 @@ func main() {
|
|||
http.HandleFunc("/katib/edit_template/", kuh.EditTemplate)
|
||||
http.HandleFunc("/katib/delete_template/", kuh.DeleteTemplate)
|
||||
http.HandleFunc("/katib/fetch_namespaces", kuh.FetchNamespaces)
|
||||
http.HandleFunc("/katib/fetch_trial_logs/", kuh.FetchTrialLogs)
|
||||
|
||||
log.Printf("Serving at %s:%s", *host, *port)
|
||||
if err := http.ListenAndServe(fmt.Sprintf("%s:%s", *host, *port), nil); err != nil {
|
||||
|
|
|
@ -1,13 +0,0 @@
|
|||
#!/bin/sh
|
||||
|
||||
# Run conformance test and generate test report.
|
||||
python test/e2e/v1beta1/scripts/gh-actions/run-e2e-experiment.py --experiment-path examples/v1beta1/hp-tuning/random.yaml --namespace kf-conformance \
|
||||
--trial-pod-labels '{"sidecar.istio.io/inject": "false"}' | tee /tmp/katib-conformance.log
|
||||
|
||||
|
||||
# Create the done file.
|
||||
touch /tmp/katib-conformance.done
|
||||
echo "Done..."
|
||||
|
||||
# Keep the container running so the test logs can be downloaded.
|
||||
while true; do sleep 10000; done
|
|
@ -1,5 +0,0 @@
|
|||
# Katib Documentation
|
||||
|
||||
Welcome to Kubeflow Katib!
|
||||
|
||||
The Katib documentation is available on [kubeflow.org](https://www.kubeflow.org/docs/components/katib/).
|
|
@ -0,0 +1,131 @@
|
|||
# Developer Guide
|
||||
|
||||
This developer guide is for people who want to contribute to the Katib project.
|
||||
If you're interesting in using Katib in your machine learning project,
|
||||
see the following user guides:
|
||||
|
||||
- [Concepts](https://www.kubeflow.org/docs/components/katib/overview/)
|
||||
in Katib, hyperparameter tuning, and neural architecture search.
|
||||
- [Getting started with Katib](https://kubeflow.org/docs/components/katib/hyperparameter/).
|
||||
- Detailed guide to [configuring and running a Katib
|
||||
experiment](https://kubeflow.org/docs/components/katib/experiment/).
|
||||
|
||||
## Requirements
|
||||
|
||||
- [Go](https://golang.org/) (1.17 or later)
|
||||
- [Docker](https://docs.docker.com/) (20.10 or later)
|
||||
- [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html) (8 or later)
|
||||
- [Python](https://www.python.org/) (3.9 or later)
|
||||
- [kustomize](https://kustomize.io/) (4.0.5 or later)
|
||||
|
||||
## Build from source code
|
||||
|
||||
Check source code as follows:
|
||||
|
||||
```bash
|
||||
make build REGISTRY=<image-registry> TAG=<image-tag>
|
||||
```
|
||||
|
||||
To use your custom images for the Katib components, modify
|
||||
[Kustomization file](https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/installs/katib-standalone/kustomization.yaml)
|
||||
and [Katib Config](https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/components/controller/katib-config.yaml)
|
||||
|
||||
You can deploy Katib v1beta1 manifests into a Kubernetes cluster as follows:
|
||||
|
||||
```bash
|
||||
make deploy
|
||||
```
|
||||
|
||||
You can undeploy Katib v1beta1 manifests from a Kubernetes cluster as follows:
|
||||
|
||||
```bash
|
||||
make undeploy
|
||||
```
|
||||
|
||||
## Modify controller APIs
|
||||
|
||||
If you want to modify Katib controller APIs, you have to
|
||||
generate deepcopy, clientset, listers, informers, open-api and Python SDK with the changed APIs.
|
||||
You can update the necessary files as follows:
|
||||
|
||||
```bash
|
||||
make generate
|
||||
```
|
||||
|
||||
## Controller Flags
|
||||
|
||||
Below is a list of command-line flags accepted by Katib controller:
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
| ------------------------------- | ------------------------- | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
|
||||
| enable-grpc-probe-in-suggestion | bool | true | Enable grpc probe in suggestions |
|
||||
| experiment-suggestion-name | string | "default" | The implementation of suggestion interface in experiment controller |
|
||||
| metrics-addr | string | ":8080" | The address the metric endpoint binds to |
|
||||
| trial-resources | []schema.GroupVersionKind | null | The list of resources that can be used as trial template, in the form: Kind.version.group (e.g. TFJob.v1.kubeflow.org) |
|
||||
| webhook-inject-securitycontext | bool | false | Inject the securityContext of container[0] in the sidecar |
|
||||
| webhook-port | int | 8443 | The port number to be used for admission webhook server |
|
||||
| enable-leader-election | bool | false | Enable leader election for katib-controller. Enabling this will ensure there is only one active katib-controller. |
|
||||
| leader-election-id | string | "3fbc96e9.katib.kubeflow.org" | The ID for leader election. |
|
||||
|
||||
## Workflow design
|
||||
|
||||
Please see [workflow-design.md](./workflow-design.md).
|
||||
|
||||
## Katib admission webhooks
|
||||
|
||||
Katib uses three [Kubernetes admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/).
|
||||
|
||||
1. `validator.experiment.katib.kubeflow.org` -
|
||||
[Validating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook)
|
||||
to validate the Katib Experiment before the creation.
|
||||
|
||||
1. `defaulter.experiment.katib.kubeflow.org` -
|
||||
[Mutating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook)
|
||||
to set the [default values](../pkg/apis/controller/experiments/v1beta1/experiment_defaults.go)
|
||||
in the Katib Experiment before the creation.
|
||||
|
||||
1. `mutator.pod.katib.kubeflow.org` - Mutating admission webhook to inject the metrics
|
||||
collector sidecar container to the training pod. Learn more about the Katib's
|
||||
metrics collector in the
|
||||
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/experiment/#metrics-collector).
|
||||
|
||||
You can find the YAMLs for the Katib webhooks
|
||||
[here](../manifests/v1beta1/components/webhook/webhooks.yaml).
|
||||
|
||||
**Note:** If you are using a private Kubernetes cluster, you have to allow traffic
|
||||
via `TCP:8443` by specifying the firewall rule and you have to update the master
|
||||
plane CIDR source range to use the Katib webhooks
|
||||
|
||||
### Katib cert generator
|
||||
|
||||
Katib uses the custom `cert-generator` [Kubernetes Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/)
|
||||
to generate certificates for the webhooks.
|
||||
|
||||
Once Katib is deployed in the Kubernetes cluster, the `cert-generator` Job follows these steps:
|
||||
|
||||
- Generate the self-signed CA certificate and private key.
|
||||
|
||||
- Generate public certificate and private key signed with the key generated in the previous step.
|
||||
|
||||
- Create a Kubernetes Secret with the signed certificate. Secret has
|
||||
the `katib-webhook-cert` name and `cert-generator` Job's `ownerReference` to
|
||||
clean-up resources once Katib is uninstalled.
|
||||
|
||||
Once Secret is created, the Katib controller Deployment spawns the Pod,
|
||||
since the controller has the `katib-webhook-cert` Secret volume.
|
||||
|
||||
- Patch the webhooks with the `CABundle`.
|
||||
|
||||
You can find the `cert-generator` source code [here](../cmd/cert-generator/v1beta1).
|
||||
|
||||
## Implement a new algorithm and use it in Katib
|
||||
|
||||
Please see [new-algorithm-service.md](./new-algorithm-service.md).
|
||||
|
||||
## Katib UI documentation
|
||||
|
||||
Please see [Katib UI README](https://github.com/kubeflow/katib/tree/master/pkg/ui/v1beta1).
|
||||
|
||||
## Design proposals
|
||||
|
||||
Please see [proposals](./proposals).
|
|
@ -5,7 +5,7 @@ Here you can find the location for images that are used in Katib.
|
|||
## Katib Components Images
|
||||
|
||||
The following table shows images for the
|
||||
[Katib components](https://www.kubeflow.org/docs/components/katib/reference/architecture/#katib-control-plane-components).
|
||||
[Katib components](https://www.kubeflow.org/docs/components/katib/hyperparameter/#katib-components).
|
||||
|
||||
<table>
|
||||
<tbody>
|
||||
|
@ -22,7 +22,7 @@ The following table shows images for the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/katib-controller</code>
|
||||
<code>docker.io/kubeflowkatib/katib-controller</code>
|
||||
</td>
|
||||
<td>
|
||||
Katib Controller
|
||||
|
@ -33,7 +33,7 @@ The following table shows images for the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/katib-ui</code>
|
||||
<code>docker.io/kubeflowkatib/katib-ui</code>
|
||||
</td>
|
||||
<td>
|
||||
Katib User Interface
|
||||
|
@ -44,7 +44,7 @@ The following table shows images for the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/katib-db-manager</code>
|
||||
<code>docker.io/kubeflowkatib/katib-db-manager</code>
|
||||
</td>
|
||||
<td>
|
||||
Katib DB Manager
|
||||
|
@ -64,13 +64,24 @@ The following table shows images for the
|
|||
<a href="https://github.com/docker-library/mysql/blob/c506174eab8ae160f56483e8d72410f8f1e1470f/8.0/Dockerfile.debian">Dockerfile</a>
|
||||
</td>
|
||||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>docker.io/kubeflowkatib/cert-generator</code>
|
||||
</td>
|
||||
<td>
|
||||
Katib Cert Generator
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/kubeflow/katib/blob/master/cmd/cert-generator/v1beta1/Dockerfile">Dockerfile</a>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
## Katib Metrics Collectors Images
|
||||
|
||||
The following table shows images for the
|
||||
[Katib Metrics Collectors](https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/).
|
||||
[Katib Metrics Collectors](https://www.kubeflow.org/docs/components/katib/experiment/#metrics-collector).
|
||||
|
||||
<table>
|
||||
<tbody>
|
||||
|
@ -87,7 +98,7 @@ The following table shows images for the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/file-metrics-collector</code>
|
||||
<code>docker.io/kubeflowkatib/file-metrics-collector</code>
|
||||
</td>
|
||||
<td>
|
||||
File Metrics Collector
|
||||
|
@ -98,7 +109,7 @@ The following table shows images for the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/tfevent-metrics-collector</code>
|
||||
<code>docker.io/kubeflowkatib/tfevent-metrics-collector</code>
|
||||
</td>
|
||||
<td>
|
||||
Tensorflow Event Metrics Collector
|
||||
|
@ -113,8 +124,8 @@ The following table shows images for the
|
|||
## Katib Suggestions and Early Stopping Images
|
||||
|
||||
The following table shows images for the
|
||||
[Katib Suggestion services](https://www.kubeflow.org/docs/components/katib/reference/architecture/#suggestion)
|
||||
and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/components/katib/user-guides/early-stopping/#early-stopping-algorithms).
|
||||
[Katib Suggestions](https://www.kubeflow.org/docs/components/katib/experiment/#search-algorithms-in-detail)
|
||||
and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/components/katib/early-stopping/).
|
||||
|
||||
<table>
|
||||
<tbody>
|
||||
|
@ -131,7 +142,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/suggestion-hyperopt</code>
|
||||
<code>docker.io/kubeflowkatib/suggestion-hyperopt</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/hyperopt/hyperopt">Hyperopt</a> Suggestion
|
||||
|
@ -142,7 +153,18 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/suggestion-skopt</code>
|
||||
<code>docker.io/kubeflowkatib/suggestion-chocolate</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/AIworx-Labs/chocolate">Chocolate</a> Suggestion
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/kubeflow/katib/blob/master/cmd/suggestion/chocolate/v1beta1/Dockerfile">Dockerfile</a>
|
||||
</td>
|
||||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>docker.io/kubeflowkatib/suggestion-skopt</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/scikit-optimize/scikit-optimize">Skopt</a> Suggestion
|
||||
|
@ -153,7 +175,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/suggestion-optuna</code>
|
||||
<code>docker.io/kubeflowkatib/suggestion-optuna</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/optuna/optuna">Optuna</a> Suggestion
|
||||
|
@ -164,7 +186,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/suggestion-goptuna</code>
|
||||
<code>docker.io/kubeflowkatib/suggestion-goptuna</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/c-bata/goptuna">Goptuna</a> Suggestion
|
||||
|
@ -175,7 +197,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/suggestion-hyperband</code>
|
||||
<code>docker.io/kubeflowkatib/suggestion-hyperband</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#hyperband">Hyperband</a> Suggestion
|
||||
|
@ -186,7 +208,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/suggestion-enas</code>
|
||||
<code>docker.io/kubeflowkatib/suggestion-enas</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#enas">ENAS</a> Suggestion
|
||||
|
@ -197,7 +219,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/suggestion-darts</code>
|
||||
<code>docker.io/kubeflowkatib/suggestion-darts</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://www.kubeflow.org/docs/components/katib/experiment/#differentiable-architecture-search-darts">DARTS</a> Suggestion
|
||||
|
@ -208,7 +230,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/earlystopping-medianstop</code>
|
||||
<code>docker.io/kubeflowkatib/earlystopping-medianstop</code>
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://www.kubeflow.org/docs/components/katib/early-stopping/#median-stopping-rule">Median Stopping Rule</a>
|
||||
|
@ -223,7 +245,7 @@ and the [Katib Early Stopping algorithms](https://www.kubeflow.org/docs/componen
|
|||
## Training Containers Images
|
||||
|
||||
The following table shows images for training containers which are used in the
|
||||
[Katib Trials](https://www.kubeflow.org/docs/components/katib/reference/architecture/#trial).
|
||||
[Katib Trials](https://www.kubeflow.org/docs/components/katib/experiment/#packaging-your-training-code-in-a-container-image).
|
||||
|
||||
<table>
|
||||
<tbody>
|
||||
|
@ -240,7 +262,18 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/pytorch-mnist-cpu</code>
|
||||
<code>docker.io/kubeflowkatib/mxnet-mnist</code>
|
||||
</td>
|
||||
<td>
|
||||
MXNet MNIST example with collecting metrics time
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/kubeflow/katib/blob/master/examples/v1beta1/trial-images/mxnet-mnist/Dockerfile">Dockerfile</a>
|
||||
</td>
|
||||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>docker.io/kubeflowkatib/pytorch-mnist-cpu</code>
|
||||
</td>
|
||||
<td>
|
||||
PyTorch MNIST example with printing metrics to the file or StdOut with CPU support
|
||||
|
@ -251,7 +284,7 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/pytorch-mnist-gpu</code>
|
||||
<code>docker.io/kubeflowkatib/pytorch-mnist-gpu</code>
|
||||
</td>
|
||||
<td>
|
||||
PyTorch MNIST example with printing metrics to the file or StdOut with GPU support
|
||||
|
@ -262,7 +295,7 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/tf-mnist-with-summaries</code>
|
||||
<code>docker.io/kubeflowkatib/tf-mnist-with-summaries</code>
|
||||
</td>
|
||||
<td>
|
||||
Tensorflow MNIST example with saving metrics in the summaries
|
||||
|
@ -273,7 +306,18 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/xgboost-lightgbm</code>
|
||||
<code>docker.io/bytepsimage/mxnet</code>
|
||||
</td>
|
||||
<td>
|
||||
Distributed BytePS example for MXJob
|
||||
</td>
|
||||
<td>
|
||||
<a href="https://github.com/bytedance/byteps/blob/v0.2.5/docker/Dockerfile">Dockerfile</a>
|
||||
</td>
|
||||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>docker.io/kubeflowkatib/xgboost-lightgbm</code>
|
||||
</td>
|
||||
<td>
|
||||
Distributed LightGBM example for XGBoostJob
|
||||
|
@ -306,7 +350,7 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/enas-cnn-cifar10-gpu</code>
|
||||
<code>docker.io/kubeflowkatib/enas-cnn-cifar10-gpu</code>
|
||||
</td>
|
||||
<td>
|
||||
Keras CIFAR-10 CNN example for ENAS with GPU support
|
||||
|
@ -317,7 +361,7 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/enas-cnn-cifar10-cpu</code>
|
||||
<code>docker.io/kubeflowkatib/enas-cnn-cifar10-cpu</code>
|
||||
</td>
|
||||
<td>
|
||||
Keras CIFAR-10 CNN example for ENAS with CPU support
|
||||
|
@ -328,7 +372,7 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/darts-cnn-cifar10-gpu</code>
|
||||
<code>docker.io/kubeflowkatib/darts-cnn-cifar10-gpu</code>
|
||||
</td>
|
||||
<td>
|
||||
PyTorch CIFAR-10 CNN example for DARTS with GPU support
|
||||
|
@ -339,7 +383,7 @@ The following table shows images for training containers which are used in the
|
|||
</tr>
|
||||
<tr align="center">
|
||||
<td>
|
||||
<code>ghcr.io/kubeflow/katib/darts-cnn-cifar10-cpu</code>
|
||||
<code>docker.io/kubeflowkatib/darts-cnn-cifar10-cpu</code>
|
||||
</td>
|
||||
<td>
|
||||
PyTorch CIFAR-10 CNN example for DARTS with CPU support
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 102 KiB |
Binary file not shown.
After Width: | Height: | Size: 192 KiB |
Before Width: | Height: | Size: 166 KiB After Width: | Height: | Size: 166 KiB |
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue