Commit Graph

453 Commits

Author SHA1 Message Date
Andrey Velichkevich c6054de898
trainer: Deploy the Released Version (#4156)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-07-25 14:00:03 +00:00
Mathew Wicks add96db991
website: use OWNERS to set area labels on PRs (#4154)
* chore: use OWNERS to set area labels

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* chore: update PR template

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

---------

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
2025-07-23 15:22:01 +00:00
kundan kumar 37b5f82cf3
trainer: User guide for PyTorch Training (#4053)
* Add PyTorch guide

Co-authored-by: izuku-sds <izuku.labs@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Refactor PyTorch Guide

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix some text

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add next steps to getting started

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update content/en/docs/components/trainer/user-guides/pytorch.md

Co-authored-by: Anya Kramar <akramar@redhat.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update content/en/docs/components/trainer/user-guides/pytorch.md

Co-authored-by: Antonin Stefanutti <astefanutti@users.noreply.github.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Change text for FSDP

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Rename to PyTorch nodes

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Remove step from get_job_logs

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Remove TODOs

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update content/en/docs/components/trainer/user-guides/pytorch.md

Co-authored-by: Antonin Stefanutti <astefanutti@users.noreply.github.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update DDP description

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Remove SDK API name from title

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Rename to Kubeflow SDK

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add Prerequisites

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add todos

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Anya Kramar <akramar@redhat.com>
Co-authored-by: Antonin Stefanutti <astefanutti@users.noreply.github.com>
2025-07-22 16:15:58 +00:00
Fellipe Resende 3027fc3fc6
spark-operator: Documentation to integrate Spark Operator and Notebooks (#4141)
* Documentation to integrate Spark Operator and Notebooks

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>

* updated documentation

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>

* updated getting started

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>

* cleaning up

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>

* add jeg link

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>

* fixed kubeflow logo in diagram

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>

* standardize terminology

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>

---------

Signed-off-by: Fellipe Resende <fellipe.barros.resende@gmail.com>
2025-07-21 14:01:58 +00:00
Matteo Mortari 52c8955a9b
model-registry: update Model Registry architecture for MLMD removal (#4148)
* model-registry: update Model Registry architecture for MLMD removal

Update architecture documentation to reflect the removal of ML-Metadata
C++ server dependency and transition to direct RDBMS backend operations.

- Replace ML-Metadata C++ server references with RDBMS backend
- Remove gRPC references of MLMD
- Add link to logical model documentation
- Update project documentation link
- Improve grammar and consistency throughout the document

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>

* Update content/en/docs/components/model-registry/reference/architecture.md

Co-authored-by: Alessio Pragliola <83355398+Al-Pragliola@users.noreply.github.com>
Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>

---------

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
Co-authored-by: Alessio Pragliola <83355398+Al-Pragliola@users.noreply.github.com>
2025-07-18 09:41:41 +00:00
Matteo Mortari 81dc23e071
model-registry: update architecture (#4147)
Follow-up to https://github.com/kubeflow/model-registry/issues/865

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
2025-07-18 07:46:40 +00:00
Sadhvi 7ea872aa1b
trainer: migration guide from training operator to Trainer V2 (#4142)
* migration guide deom training operator to kubeflow trainer V2

Signed-off-by: akiseakusa <sadhvi8807@gmail.com>

* Update documentation with default Torch runtime and TrainJob override

Signed-off-by: akiseakusa <sadhvi8807@gmail.com>

* workflow write permission added

Signed-off-by: akiseakusa <sadhvi8807@gmail.com>

* Remove permission from workflow

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update the migration guide

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: akiseakusa <sadhvi8807@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-07-17 19:56:40 +00:00
Andrey Velichkevich 0453687e2b
trainer: Update Kubeflow Trainer personas diagram (#4144)
* trainer: Update Kubeflow Trainer personas diagram

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update personas

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update Lifecycle Diagram

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update runtime guide

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Add dependsOn

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-07-16 15:42:39 +00:00
Garvit Khandelwal ab27782aee
trainer: Documentation for Runtime Guide (#4054)
* initial commit

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* reviewed-changes

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* refined commit

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* changes

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* revised changes: Runtime.md

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* merge commit

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* modified commit

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* modified commit

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

---------

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>
2025-07-16 14:51:39 +00:00
Michael b593f10706
pipelines: Control Flow - remove unsupported banners (#4143)
Current implemented features on master for pipelines that will be available for 2.6:
loop parallelism - https://github.com/kubeflow/pipelines/pull/10798
oneOf - https://github.com/kubeflow/pipelines/pull/11196
dsl.Collected - https://github.com/kubeflow/pipelines/pull/11725
PipelineTaskFinalStatus - https://github.com/kubeflow/pipelines/pull/11953

Signed-off-by: Michael <m.zazula@gmail.com>
2025-07-11 21:31:53 +00:00
SanthoshToorpu 63a12d7a47
katib: Update LLM HP tuning guide to clarify tunable fields and fix resource section (#4067)
* Update Katib LLM HP tuning guide: clarify tunable fields, fix resources config, remove unsupported params

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

* fixed broken link

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

* added cross refereencing of huggingface params to the legacy trainer v1 docs instead of referring in the llm-hp file

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

* changed the broken link

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

* removed a false positive

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

* removed the unrelated params and moved the s3dataset params to the legacy trainer docs

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

* Changed the title from hugging face params to dataset and model parameter classes

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

* fixed a broken link due to change in name

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>

---------

Signed-off-by: SanthoshToorpu <toorpusanthosh@gmail.com>
2025-06-17 15:39:09 +00:00
M!l!nd 77b9ce9bbe
pipelines: fix link in `Migrate to Kubeflow Pipelines v2` (#4049)
Signed-off-by: M!l!nd <99114125+milinddethe15@users.noreply.github.com>
2025-06-17 14:08:10 +00:00
Fabrice Jammes 8a99b4da04
spark-operator: Document how to monitor with jmx and prometheus (#4098)
Signed-off-by: Fabrice Jammes <fabrice.jammes@clermont.in2p3.fr>
2025-06-17 14:05:09 +00:00
Daniel Dowler f3d6b27d7d
pipelines: Update pipeline concept docs (#4074)
* created IR YAML concept page

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* updated pipeline and pipeline root concept pages

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* updated links

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* Update content/en/docs/components/pipelines/concepts/pipeline.md

Co-authored-by: Ricardo Martinelli de Oliveira <rmartine@redhat.com>
Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* fixed formatting issues

Co-authored-by: Helber Belmiro <helber.belmiro@gmail.com>
Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* removed external vendor platform mention

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* link formatting

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* minor formatting

Co-authored-by: Helber Belmiro <helber.belmiro@gmail.com>
Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

fixed word plurality

Co-authored-by: Helber Belmiro <helber.belmiro@gmail.com>
Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* fixed link issues

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

---------

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>
Co-authored-by: Ricardo Martinelli de Oliveira <rmartine@redhat.com>
Co-authored-by: Helber Belmiro <helber.belmiro@gmail.com>
2025-06-17 14:04:09 +00:00
Yunkai Li efb1156f29
docs(pipelines): Add troubleshooting tip for standalone Pipelines installation (#4084)
* docs(pipelines): add troubleshooting tip for installation using platform-agnostic path

Some users reported pod crashes (e.g., proxy-agent, workflow-controller) when following the default installation instruction using `env/dev`. This patch adds a troubleshooting tip to use `env/platform-agnostic` as an alternative.

Verified working on Minikube with v2.0.0, based on community suggestions (see kubeflow/pipelines#9546).

Fixes kubeflow/pipelines#11757

Signed-off-by: Yunkai Li <54110380+kaikaila@users.noreply.github.com>

* Correct the format of "3 minutes to complete"

Signed-off-by: Yunkai Li <54110380+kaikaila@users.noreply.github.com>

---------

Signed-off-by: Yunkai Li <54110380+kaikaila@users.noreply.github.com>
2025-06-17 14:01:09 +00:00
MonkeyCanCode 5dec790d73
Update benchmarking.md
Signed-off-by: MonkeyCanCode <yongzheng0809@gmail.com>
2025-06-10 13:54:36 -05:00
Eoin Fennessy 5e7a27edab
trainer: Update SDK source repo (#4126)
* Update Trainer SDK source repo

Signed-off-by: Eoin Fennessy <efenness@redhat.com>

* Update Trainer SDK link

Signed-off-by: Eoin Fennessy <efenness@redhat.com>

---------

Signed-off-by: Eoin Fennessy <efenness@redhat.com>
2025-06-10 01:05:48 +00:00
Yi Chen e683c0c084
spark-operator: Remove docs associated with sparkctl (#4089)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-06-06 04:38:16 +00:00
Matteo Mortari fc0b2afa57
model-registry: add Al-Pragliola in approvers (#4124)
Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
2025-06-02 11:59:23 +00:00
Anton Pechenin 7718b11b08
add a new s3.forcePathStyle configuration parameter to the samples. (#3885)
Signed-off-by: arpechenin <arpechenin@avito.ru>
2025-05-22 11:59:20 +00:00
Matteo Mortari 85b5867c7e
model-registry: bump version to 0.2.18 (#4114)
* model-registry: bump version to 0.2.18

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>

* reflect manifest changes for standalone installation

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>

Co-authored-by: Alessio Pragliola <83355398+Al-Pragliola@users.noreply.github.com>

---------

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
Co-authored-by: Alessio Pragliola <83355398+Al-Pragliola@users.noreply.github.com>
2025-05-20 08:56:36 +00:00
Matteo Mortari d67c9a2830
model-registry: better link to issue selector screen (#4107)
Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
2025-05-19 18:14:36 +00:00
Matt Prahl 54005ba799
pipelines: Add KFP documentation for importing modelcars (#4097)
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
2025-05-07 20:49:41 +00:00
M!l!nd d95e60f796
model-registry: add YAML manifest example for creating InferenceService (#4048)
* add YAML manifest example for creating InferenceService

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>

* Update content/en/docs/components/model-registry/getting-started.md

Co-authored-by: Matteo Mortari <matteo.mortari@gmail.com>
Signed-off-by: M!l!nd <99114125+milinddethe15@users.noreply.github.com>

* Update content/en/docs/components/model-registry/getting-started.md

Co-authored-by: Matteo Mortari <matteo.mortari@gmail.com>
Signed-off-by: M!l!nd <99114125+milinddethe15@users.noreply.github.com>

---------

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: M!l!nd <99114125+milinddethe15@users.noreply.github.com>
Co-authored-by: Matteo Mortari <matteo.mortari@gmail.com>
2025-05-06 19:55:40 +00:00
Mahdi Khashan 3704da997f
trainer: update fine-tune example to use `eval_strategy` (#4099)
Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>
2025-05-02 13:43:04 +00:00
Daniel Dowler 23d50fea25
pipelines: updated KFP component concept page (#4062)
* updated KFP component concept page

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* fix links

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* added back next steps section

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* minor wording changes

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* added todo comment

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* minor updates

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* small formatting updates

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* fixed link, small formatting

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* Update content/en/docs/components/pipelines/concepts/component.md

Co-authored-by: Matt Prahl <mprahl@users.noreply.github.com>
Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

* replaced hard-coded image with command for maintainability

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>

---------

Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>
Co-authored-by: Matt Prahl <mprahl@users.noreply.github.com>
2025-04-25 13:33:58 +00:00
Rahul k 239078a929
fix: getting-started broken link (#4091)
Signed-off-by: Rahul k <83643646+rahulk789@users.noreply.github.com>
2025-04-25 13:27:59 +00:00
Helber Belmiro b75743b25e
pipelines: Updated Pipelines' OWNERS file (#4087)
Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>
2025-04-14 19:49:22 +00:00
Helber Belmiro 5c9995dedb
pipelines: Added proxy documentation (#4073)
* pipelines: Added proxy documentation

Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>

* Update content/en/docs/components/pipelines/operator-guides/server-config.md

Co-authored-by: Matt Prahl <mprahl@users.noreply.github.com>
Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>

* pipelines: Removed unneeded sections

Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>

---------

Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>
Co-authored-by: Matt Prahl <mprahl@users.noreply.github.com>
2025-04-14 17:51:21 +00:00
Alessio Pragliola d812996056
notebooks: update container images to ghcr.io (#4082)
* chore(notebooks): update container images to ghcr.io

Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>

* fix(notebooks): remove additional kubeflow/ from links

Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>

---------

Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
2025-04-11 15:47:06 +00:00
Garvit Khandelwal 0d462bd03e
trainer: Add documentation for the MultiKueue and spec.managedBy API (#3956)
* resolves kubeflow/training/#2279

Signed-off-by: Garvit-77 <garvitname@gmail.com>

* Update content/en/docs/components/training/user-guides/managedby.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Update content/en/docs/components/training/user-guides/managedby.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Update content/en/docs/components/training/user-guides/managedby.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Update content/en/docs/components/training/user-guides/managedby.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Update content/en/docs/components/training/user-guides/managedby.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Update content/en/docs/components/training/user-guides/managedby.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Create managedby.md

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Delete content/en/docs/components/training/user-guides/managedby.md

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Comments-updated

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Update tensorflow.md

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* Updated weight managedby.md

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

* updated

Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>

---------

Signed-off-by: Garvit-77 <garvitname@gmail.com>
Signed-off-by: Garvit Khandelwal <70192868+Garvit-77@users.noreply.github.com>
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
2025-04-10 18:13:14 +00:00
Mathew Wicks 4f092f1576
website: Add dark theme (#3981)
* Add dark theme to website

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Fix white borders on images in dark mode

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Fix tables in dark mode

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Fix home action buttons on very small screens

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Undo architecture diagram changes

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Update trainer homepage copy based on review

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Make variables for KF colors

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Update search colors based on review

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Use project logos with words on homepage

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* Make home section borders gray

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

---------

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
2025-03-30 01:45:29 +00:00
Andrey Velichkevich d116f2d311
trainer: Update Slack channel (#4059)
* trainer: Update Slack channel

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix link

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-03-26 21:06:17 +00:00
Daniel Dowler 260ae4bce3
pipelines: Improvements to the KFP overview doc (#4057)
Signed-off-by: Daniel Dowler <12484302+dandawg@users.noreply.github.com>
2025-03-25 12:07:50 +00:00
Mahdi Khashan de75668704
katib: [USERGUIDE] LLM Hyperparameter Optimization API (#3952)
* add base md

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update title and description

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add draft code

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add prerequisites

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add huggingface api details,s3 api, update example

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* remove redundant text

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add HuggingFaceTrainerParams description

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update prerequisites

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update code example

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add sections

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* replace langauge models with large language models

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve prerequisites

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* algorithm_name is optional

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* objective_type is optional

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* objective_metric_name is optional

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* remove redundant example

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* change tune args to optional

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add search api

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update link title

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add two scenarios for tune function with custom objective or loading model and parameters from hugging face

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add link for custom objective function example

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve tune section

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve title

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix failing ci

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add warning of alpha api

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve links

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve python code consistency

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* define search space for r in LoraConfig

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* remove redundant line

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* make sure imports are all consistent in snippets

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve link

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve fine-tune section

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve links in prerequisites

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve structure of integrations section

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* add missing import

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* replace local address instead of hardcoded link to website

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix import

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* use hyperparameter optimization instead of fine-tune

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix header levels

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* replace code import

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* replace name

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* replace definition of distributed training

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* decrease header level

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* decrease header level

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update configuration for `resource_per_trial`

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update header title

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update training operator control plane

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* improve prerequisites

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update prerequisites

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update page into description

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update page description

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* update page title and adjust description letter

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* move the doc to the parent folder

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* remove section

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* modify message of the page

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix typo

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* fix typo

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

* mention Training Operator

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>

---------

Signed-off-by: mahdikhashan <mahdikhashan1@gmail.com>
2025-03-25 11:33:51 +00:00
Eoin Fennessy a534660c58
trainer: update getting-started.md to reflect API changes (#4058)
Signed-off-by: Eoin Fennessy <efenness@redhat.com>
2025-03-25 11:15:50 +00:00
Paul Boyd b4f1910259
model-registry: update the installation guide (#4042)
I found installing model registry from the instructions on the website
to be rather hard to do. The instructions don't actually have all the
steps needed, so it sent me to the README in the manifests repo. Which,
confusingly, sent me right back to the website. Neither document is
entirely complete so I had to coble together bits from both places.

I'm assuming these instructions are primarily useful to existing
Kubeflow users who want to try out model registry so I've rewritten the
"Installing on Kubeflow Platform" section with the idea to get those
users up and running quickly. I'm further assuming that these users will
want the UI (I mean, it's a good UI, who wouldn't?) so we can cut the
"if you want the UI verbage" and simplify some of the commands.

Signed-off-by: Paul Boyd <pboyd@redhat.com>
2025-03-20 16:50:03 +00:00
Yi Chen 11d6166464
spark-operator: customize spark operator (#4040)
Signed-off-by: Yi Chen <github@chenyicn.net>
2025-03-20 10:39:41 +00:00
Shao Wang 59469d8525
trainer: update get-started example of trainer. (#4041)
Signed-off-by: Electronic-Waste <2690692950@qq.com>
2025-03-13 02:40:52 +00:00
Vara Bonthu a7641d3c82
feat: Kubeflow Spark Operator Benchmarks Docs (#4030)
* feat: Spark Operator Benchmarks Docs

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

* Fixed image links in the doc

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

* Updated with links and added Future work and ACK section

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>

---------

Signed-off-by: Vara Bonthu <vara.bonthu@gmail.com>
2025-03-12 03:12:35 +00:00
Takafumi Takahashi fb10cdafff
Fix Scheduling Policy doc for MPIJob (#4033)
Signed-off-by: ttakahashi21 <takafumi.takahashi@hitachivantara.com>
2025-03-11 16:33:40 +00:00
Lucas Fernandez 7cb931dc86
Add model registry UI installation instructions (#4013)
Signed-off-by: Lucas Fernandez <lucasfernandezaragon@gmail.com>
2025-03-03 13:55:45 +00:00
Valentina Rodriguez Sosa 0a9eeff47c
Small updates to Central Dashboard docs (#3946)
* update fix/broken links

Signed-off-by: varodrig <varodrig@redhat.com>

* minor updates to documentation

Signed-off-by: varodrig <varodrig@redhat.com>

* Update content/en/docs/components/central-dash/profiles.md

Co-authored-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Valentina Rodriguez Sosa <64439402+varodrig@users.noreply.github.com>

* Update content/en/docs/components/central-dash/customize.md

Co-authored-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Valentina Rodriguez Sosa <64439402+varodrig@users.noreply.github.com>

* Update content/en/docs/components/central-dash/profiles.md

Co-authored-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Valentina Rodriguez Sosa <64439402+varodrig@users.noreply.github.com>

* Update content/en/docs/components/central-dash/profiles.md

Co-authored-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Valentina Rodriguez Sosa <64439402+varodrig@users.noreply.github.com>

* remove lines

Signed-off-by: varodrig <varodrig@redhat.com>

* fix typos

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

* fix typos 2

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>

---------

Signed-off-by: varodrig <varodrig@redhat.com>
Signed-off-by: Valentina Rodriguez Sosa <64439402+varodrig@users.noreply.github.com>
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Co-authored-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
2025-02-20 18:04:43 +00:00
Valentina Rodriguez Sosa 83401cec8a
pipelines: Update wording Pipelines in Quick Start (#4004)
* update fix/broken links

Signed-off-by: varodrig <varodrig@redhat.com>

* update wording

Signed-off-by: varodrig <varodrig@redhat.com>

---------

Signed-off-by: varodrig <varodrig@redhat.com>
2025-02-18 15:05:41 +00:00
Gary Miguel 27e999822d
katib metrics-collector: mention supported writers (#3999)
* katib metrics-collector: mention supported writers

See https://github.com/kubeflow/katib/pull/2467

Signed-off-by: Gary Miguel <garymm@garymm.org>

* add 'metrics' word

Signed-off-by: Gary Miguel <garymm@garymm.org>

---------

Signed-off-by: Gary Miguel <garymm@garymm.org>
2025-02-15 01:03:37 +00:00
Andrey Velichkevich 8ad90c5a31
trainer: Add deprecation warning to Training Operator v1 docs (#3997)
* trainer: Add deprecation warning to Training Operator v1 docs

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix redirects

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-02-15 00:25:37 +00:00
Mathew Wicks 6aefa59614
Add Swagger UI for Model Registry (#3980)
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
2025-02-14 22:29:37 +00:00
Andrey Velichkevich 10b7063075
trainer: Initial Documentation for Kubeflow Trainer V2 (#3958)
* Kubeflow Trainer V2 Docs

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update index

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update Getting Started example

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Improve text

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Fix example

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* Update diagram in light appearance

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

---------

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-02-11 17:37:28 +00:00
Ratnopam Charabarti 51e4aeb21c
Update Spark Operator Prometheus Metrics Guide (#3983)
Signed-off-by: Ratnopam Chakrabarti <ratnopamc@yahoo.com>
2025-02-11 11:27:29 +00:00
M!l!nd 39839caf8a
Katib: Update env-variables.md (#3987)
Signed-off-by: M!l!nd <99114125+milinddethe15@users.noreply.github.com>
2025-02-10 16:49:06 +00:00