Apply Docs Restructure to `v1.2-branch` = update `v1.2-branch` to current `master` v2 (#2612)
* Create "Distributions" with kfctl + Kubeflow Operator (#2492) * Create methods folder == section * Move /operator under /methods * Update links on Operator * Add 'kfctl' folder == section * mv kfctl specific minikube docs under /kfctl * Update links on minikube * mv kustomize from other-guides to /kfctl * fix links for kustomize change * delete outdated redirect * move istio-dex-auth to /kfctl + rename to multi-user * fix links after name change * move kfctl install under /kfctl + rename to deployment * fix links after move * Add OWNERS for accountability Update kfctl description Update content/en/docs/methods/_index.md * Add redirects for Operator * Add redirects for kfctl * Rename "methods" to "distributions" * update redirects to distributions as folder name * doc: Add instructions to access cluster with IBM Cloud vpc-gen2. (#2530) * doc, Add instructions to access cluster with IBM Cloud vpc-gen2. * added extra steps. * Improved formatting * Added details for creating cluster against existing VPC * Apply suggestions from code review Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Formatting fixes, as per the review. * Added a note about security. * Choose between a classic or vpc-gen2 provider. * added a note * formatting fixes * Apply suggestions from code review Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Document split up. * Cleanup. * Apply suggestions from code review Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Tommy Li <Tommy.chaoping.li@ibm.com> * Formatting improvements and cleanup. * format fixes * Apply suggestions from code review Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> Co-authored-by: Tommy Li <Tommy.chaoping.li@ibm.com> * Add RFMVasconcelos to OWNERS/approvers (#2539) * Deletes old redirects - pages do not exist anymore (#2552) * Move `AWS` platform under /distributions (#2551) * move /aws under /distributions * fix AWS redirects + add catch-all * update broken link (#2557) * UPDATE fix broken links to tensoflorw serving (#2558) * Move `Google` platform under /distributions (#2547) * move /gke folder to under /distributions * update redirects * Move `Azure` platform under /distributions (#2548) * mv /azure to /distributions * add catch-all azure to redirects * KFP - Update Python function-based component doc with param naming rules (#2544) * Describe pipeline param naming Adds notes on how the KFP SDK updates param names to describe the data instead of the implementation. Updates passing data by value to indicate that users can pass lists and dictionaries. * Update auto-gen Markdown Updates python-function-components.md with changes to python-function-components.ipynb. * Move `Openshift` platform under /distributions (#2550) * move /openshift to under /distributions * add openshift catch-all to redirects * Move `IBM` platform under /distributions (#2549) * move /ibm to under /distributions * Add IBM catch-all to redirects * [IBM] Update openshift kubeflow installation (#2560) * Make kfctl first distribution (#2562) * Move getting started on K8s page to under kfctl distribution (#2569) * mv overview to under kfctl * delete empty getting started with k8s section * Add redirect to catch traffic * Update GCP distribution OWNERS (#2574) * Update KFP shortcodes OWNERS (#2575) * Move MicroK8s to distributions (#2577) * create microk8s folder in distributions * move microk8s docs to distributions * update title * Add redirect for MicroK8s move - missed on #2577 (#2579) * Add Charmed Kubeflow Operators to list of available Kubeflow distributions (#2578) * Uplevel clouds for a level playing field * Add Owners + Index of Charmed Kubeflow * Add install page to Charmed Kubeflow distribution * Link to Charmed Kubeflow docs * Naming corrections * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Update content/en/docs/distributions/charmed/install-kubeflow.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * final fixes Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> * Fix broken link (#2581) * IBM Cloud docs: update pipelines SDK setup for single-user (#2571) Made the following changes to the instructions for setting up the pipelines SDK for single-user. * append '/pipeline' to the host string * add client.list_experiments to make sure the setup is working, consistent with the multi-user example in section 2 * add a note about KUBEFLOW_PUBLIC_ENDPOINT_URL since the user may or may not have exposed the endpoint as a LoadBalancer Signed-off-by: Chin Huang <chhuang@us.ibm.com> * update broken links / tweak names (#2583) * Move MiniKF to distributions (#2576) * create minikf folde + index * move minikf docs to minikf folder * Add redirects for external links * Change naming according to request * update description minikf * Clean up "Frameworks for training" + rename to "Training Operators" (#2584) * Remove outdated banners from Pytorch and TF * delete chainer * order TF and pyT up * rename "Frameworks for training" to "Training operators" * Fix broken link (#2580) * Remove "outdated" banners from MPI + MXnet operators (#2585) * docs: Update MPI and MXNet operator pages (#2586) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com> * Pin the version of kustomize, v4 is not supported. (#2572) * Pin the version of kustomize, v4 is not supported. There are issues installing Kubeflow with version v4. Note: https://github.com/kubeflow/website/issues/2570 https://github.com/kubeflow/kubeflow/issues/5755 * Add refrence to manifest repo version. * Default to 3.2.0 * Update gke/anthos.md (#2591) * fix broken link (#2603) Co-authored-by: Prashant Sharma <prashsh1@in.ibm.com> Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> Co-authored-by: Tommy Li <Tommy.chaoping.li@ibm.com> Co-authored-by: Mathew Wicks <thesuperzapper@users.noreply.github.com> Co-authored-by: JohanWork <39947546+JohanWork@users.noreply.github.com> Co-authored-by: Joe Liedtke <joeliedtke@google.com> Co-authored-by: Mofizur Rahman <moficodes@gmail.com> Co-authored-by: Yuan (Bob) Gong <4957653+Bobgy@users.noreply.github.com> Co-authored-by: Chin Huang <chhuang@us.ibm.com> Co-authored-by: brett koonce <koonce@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: drPytho <filip@voiapp.io> Co-authored-by: Ihor Sychevskyi <arhell333@gmail.com>
7
OWNERS
|
|
@ -2,6 +2,7 @@ approvers:
|
|||
- animeshsingh
|
||||
- Bobgy
|
||||
- joeliedtke
|
||||
- RFMVasconcelos
|
||||
reviewers:
|
||||
- 8bitmp3
|
||||
- aronchick
|
||||
|
|
@ -9,9 +10,7 @@ reviewers:
|
|||
- dansanche
|
||||
- dsdinter
|
||||
- Jeffwan
|
||||
- jinchihe
|
||||
- jinchihe
|
||||
- nickchase
|
||||
- pdmack
|
||||
- RFMVasconcelos
|
||||
- terrytangyuan
|
||||
|
||||
- terrytangyuan
|
||||
|
|
|
|||
|
|
@ -34,22 +34,22 @@
|
|||
/docs/pipelines/tutorials/pipelines-tutorial/ /docs/components/pipelines/tutorials/cloud-tutorials/
|
||||
/docs/gke/pipelines-tutorial/ /docs/components/pipelines/tutorials/cloud-tutorials/
|
||||
/docs/gke/pipelines/pipelines-tutorial/ /docs/components/pipelines/tutorials/cloud-tutorials/
|
||||
/docs/gke/authentication-pipelines/ /docs/gke/pipelines/authentication-pipelines/
|
||||
/docs/gke/authentication-pipelines/ /docs/distributions/gke/pipelines/authentication-pipelines/
|
||||
|
||||
/docs/pipelines/metrics/ /docs/components/pipelines/sdk/pipelines-metrics/
|
||||
/docs/pipelines/metrics/pipelines-metrics/ /docs/components/pipelines/sdk/pipelines-metrics/
|
||||
/docs/pipelines/metrics/output-viewer/ /docs/components/pipelines/sdk/output-viewer/
|
||||
/docs/pipelines/pipelines-overview/ /docs/components/pipelines/overview/pipelines-overview/
|
||||
/docs/pipelines/enable-gpu-and-tpu/ /docs/gke/pipelines/enable-gpu-and-tpu/
|
||||
/docs/pipelines/sdk/enable-gpu-and-tpu/ /docs/gke/pipelines/enable-gpu-and-tpu/
|
||||
/docs/pipelines/sdk/gcp/enable-gpu-and-tpu/ /docs/gke/pipelines/enable-gpu-and-tpu/
|
||||
/docs/pipelines/preemptible/ /docs/gke/pipelines/preemptible/
|
||||
/docs/pipelines/sdk/gcp/preemptible/ /docs/gke/pipelines/preemptible/
|
||||
/docs/pipelines/enable-gpu-and-tpu/ /docs/distributions/gke/pipelines/enable-gpu-and-tpu/
|
||||
/docs/pipelines/sdk/enable-gpu-and-tpu/ /docs/distributions/gke/pipelines/enable-gpu-and-tpu/
|
||||
/docs/pipelines/sdk/gcp/enable-gpu-and-tpu/ /docs/distributions/gke/pipelines/enable-gpu-and-tpu/
|
||||
/docs/pipelines/preemptible/ /docs/distributions/gke/pipelines/preemptible/
|
||||
/docs/pipelines/sdk/gcp/preemptible/ /docs/distributions/gke/pipelines/preemptible/
|
||||
/docs/pipelines/reusable-components/ /docs/examples/shared-resources/
|
||||
/docs/pipelines/sdk/reusable-components/ /docs/examples/shared-resources/
|
||||
|
||||
# Moved the guide to monitoring GKE deployments.
|
||||
/docs/other-guides/monitoring/ /docs/gke/monitoring/
|
||||
/docs/other-guides/monitoring/ /docs/distributions/gke/monitoring/
|
||||
|
||||
# Created a new section for pipeline concepts.
|
||||
/docs/pipelines/pipelines-concepts/ /docs/components/pipelines/concepts/
|
||||
|
|
@ -88,24 +88,20 @@ docs/started/requirements/ /docs/started/getting-started/
|
|||
# Restructured the getting-started and other-guides sections.
|
||||
/docs/started/getting-started-k8s/ /docs/started/k8s/
|
||||
/docs/started/getting-started-minikf/ /docs/started/workstation/getting-started-minikf/
|
||||
/docs/started/getting-started-minikube/ /docs/started/workstation/minikube-linux/
|
||||
/docs/started/getting-started-minikube/ /docs/started/distributions/kfctl/minikube/
|
||||
/docs/other-guides/virtual-dev/getting-started-minikf/ /docs/started/workstation/getting-started-minikf/
|
||||
/docs/started/getting-started-multipass/ /docs/started/workstation/getting-started-multipass/
|
||||
/docs/other-guides/virtual-dev/getting-started-multipass/ /docs/started/workstation/getting-started-multipass/
|
||||
/docs/other-guides/virtual-dev/ /docs/started/workstation/
|
||||
/docs/started/getting-started-aws/ /docs/started/cloud/getting-started-aws/
|
||||
/docs/started/getting-started-azure/ /docs/started/cloud/getting-started-azure/
|
||||
/docs/started/getting-started-gke/ /docs/started/cloud/getting-started-gke/
|
||||
/docs/started/getting-started-iks/ /docs/started/cloud/getting-started-iks/
|
||||
|
||||
/docs/use-cases/kubeflow-on-multinode-cluster/ /docs/other-guides/kubeflow-on-multinode-cluster/
|
||||
/docs/use-cases/job-scheduling/ /docs/other-guides/job-scheduling/
|
||||
|
||||
# Remove Kubeflow installation on existing EKS cluster
|
||||
/docs/aws/deploy/existing-cluster/ /docs/aws/deploy/install-kubeflow/
|
||||
/docs/aws/deploy/existing-cluster/ /docs/distributions/aws/deploy/install-kubeflow/
|
||||
|
||||
# Move the kustomize guide to the config section
|
||||
/docs/components/misc/kustomize/ /docs/other-guides/kustomize/
|
||||
/docs/components/misc/kustomize/ /docs/distributions/kfctl/kustomize/
|
||||
|
||||
# Merged the UIs page with the new central dashboard page
|
||||
/docs/other-guides/accessing-uis/ /docs/components/central-dash/overview/
|
||||
|
|
@ -116,12 +112,38 @@ docs/started/requirements/ /docs/started/getting-started/
|
|||
# Rename TensorRT Inference Server to Triton Inference Server
|
||||
/docs/components/serving/trtinferenceserver /docs/components/serving/tritoninferenceserver
|
||||
|
||||
# Kubeflow Operator move to under distributions
|
||||
/docs/operator /docs/distributions/operator
|
||||
/docs/operator/introduction /docs/distributions/operator/introduction
|
||||
/docs/operator/install-operator /docs/distributions/operator/install-operator
|
||||
/docs/operator/install-kubeflow /docs/distributions/operator/install-kubeflow
|
||||
/docs/operator/uninstall-kubeflow /docs/distributions/operator/uninstall-kubeflow
|
||||
/docs/operator/uninstall-operator /docs/distributions/operator/uninstall-operator
|
||||
/docs/operator/troubleshooting /docs/distributions/operator/troubleshooting
|
||||
|
||||
# kfctl move to under distributions
|
||||
/docs/started/workstation/minikube-linux /docs/distributions/kfctl/minikube
|
||||
/docs/other-guides/kustomize /docs/distributions/kfctl/kustomize
|
||||
/docs/started/k8s/kfctl-istio-dex /docs/distributions/kfctl/multi-user
|
||||
/docs/started/k8s/kfctl-k8s-istio /docs/distributions/kfctl/deployment
|
||||
|
||||
# Moved Job scheduling under Training
|
||||
/docs/other-guides/job-scheduling/ /docs/components/training/job-scheduling/
|
||||
|
||||
# Moved KFServing
|
||||
/docs/components/serving/kfserving/ /docs/components/kfserving
|
||||
|
||||
# Moved MicroK8s to distributions
|
||||
/docs/started/workstation/kubeflow-on-microk8s /docs/distributions/microk8s/kubeflow-on-microk8s
|
||||
|
||||
# Moved K8s deployment overview to under kfctl
|
||||
/docs/started/k8s/overview /docs/distributions/kfctl/overview
|
||||
|
||||
# Moved MiniKF to distributions
|
||||
/docs/started/workstation/getting-started-minikf /docs/distributions/getting-started-minikf
|
||||
/docs/started/workstation/minikf-aws /docs/distributions/minikf-aws
|
||||
/docs/started/workstation/minikf-gcp /docs/distributions/minikf-gcp
|
||||
|
||||
# ===============
|
||||
# IMPORTANT NOTE:
|
||||
# Catch-all redirects should be added at the end of this file as redirects happen from top to bottom
|
||||
|
|
@ -129,3 +151,8 @@ docs/started/requirements/ /docs/started/getting-started/
|
|||
/docs/guides/* /docs/:splat
|
||||
/docs/pipelines/concepts/* /docs/components/pipelines/overview/concepts/:splat
|
||||
/docs/pipelines/* /docs/components/pipelines/:splat
|
||||
/docs/aws/* /docs/distributions/aws/:splat
|
||||
/docs/azure/* /docs/distributions/azure/:splat
|
||||
/docs/gke/* /docs/distributions/gke/:splat
|
||||
/docs/ibm/* /docs/distributions/ibm/:splat
|
||||
/docs/openshift/* /docs/distributions/openshift/:splat
|
||||
|
|
|
|||
|
|
@ -74,7 +74,7 @@ Port-forwarding typically does not work if any of the following are true:
|
|||
with the [CLI deployment](/docs/gke/deploy/deploy-cli/). (If you want to
|
||||
use port forwarding, you must deploy Kubeflow on an existing Kubernetes
|
||||
cluster using the [`kfctl_k8s_istio`
|
||||
configuration](/docs/started/k8s/kfctl-k8s-istio/).)
|
||||
configuration](/docs/methods/kfctl/deployment).)
|
||||
|
||||
* You've configured the Istio ingress to only accept
|
||||
HTTPS traffic on a specific domain or IP address.
|
||||
|
|
|
|||
|
|
@ -755,7 +755,7 @@ kubectl apply -f <your-path/your-experiment-config.yaml>
|
|||
- (Optional) Katib's experiments don't work with
|
||||
[Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection).
|
||||
If you install Kubeflow using
|
||||
[Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/),
|
||||
[Istio config](https://www.kubeflow.org/docs/methods/kfctl/deployment),
|
||||
you have to disable sidecar injection. To do that, specify this annotation:
|
||||
`sidecar.istio.io/inject: "false"` in your experiment's trial template. For
|
||||
examples on how to do it for `Job`, `TFJob` (TensorFlow) or
|
||||
|
|
|
|||
|
|
@ -141,7 +141,7 @@ an experiment using the random algorithm example:
|
|||
1. (Optional) **Note:** Katib's experiments don't work with
|
||||
[Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection).
|
||||
If you installed Kubeflow using
|
||||
[Istio config](/docs/started/k8s/kfctl-k8s-istio/),
|
||||
[Istio config](/docs/methods/kfctl/deployment),
|
||||
you have to disable sidecar injection. To do that, specify this annotation:
|
||||
`sidecar.istio.io/inject: "false"` in your experiment's trial template.
|
||||
|
||||
|
|
@ -394,7 +394,7 @@ the Kubeflow's TensorFlow training job operator, TFJob:
|
|||
1. (Optional) **Note:** Katib's experiments don't work with
|
||||
[Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection).
|
||||
If you installed Kubeflow using
|
||||
[Istio config](/docs/started/k8s/kfctl-k8s-istio/),
|
||||
[Istio config](/docs/methods/kfctl/deployment),
|
||||
you have to disable sidecar injection. To do that, specify this annotation:
|
||||
`sidecar.istio.io/inject: "false"` in your experiment's trial template.
|
||||
For the provided `TFJob` example check
|
||||
|
|
@ -438,7 +438,7 @@ using Kubeflow's PyTorch training job operator, PyTorchJob:
|
|||
1. (Optional) **Note:** Katib's experiments don't work with
|
||||
[Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection).
|
||||
If you installed Kubeflow using
|
||||
[Istio config](/docs/started/k8s/kfctl-k8s-istio/),
|
||||
[Istio config](/docs/methods/kfctl/deployment),
|
||||
you have to disable sidecar injection. To do that, specify this annotation:
|
||||
`sidecar.istio.io/inject: "false"` in your experiment's trial template.
|
||||
For the provided `PyTorchJob` example setting the annotation should be similar to
|
||||
|
|
|
|||
|
|
@ -133,7 +133,7 @@ You can use the following interfaces to interact with Katib:
|
|||
|
||||
- **kfctl** is the Kubeflow CLI that you can use to install and configure
|
||||
Kubeflow. Learn about kfctl in the guide to
|
||||
[configuring Kubeflow](/docs/other-guides/kustomize/).
|
||||
[configuring Kubeflow](/docs/methods/kfctl/kustomize/).
|
||||
|
||||
- The Kubernetes CLI, **kubectl**, is useful for running commands against your
|
||||
Kubeflow cluster. Learn about kubectl in the [Kubernetes
|
||||
|
|
|
|||
|
|
@ -53,7 +53,7 @@ master should share the same identity management.
|
|||
|
||||
## Supported platforms
|
||||
* Kubeflow multi-tenancy is enabled by default if you deploy Kubeflow on GCP with [IAP](/docs/gke/deploy).
|
||||
* If you are not on GCP, you can deploy multi-tenancy to [your existing cluster](/docs/started/k8s/kfctl-istio-dex/).
|
||||
* If you are not on GCP, you can deploy multi-tenancy to [your existing cluster](/docs/methods/kfctl/multi-user).
|
||||
|
||||
## Next steps
|
||||
|
||||
|
|
|
|||
|
|
@ -56,7 +56,7 @@ A _pipeline component_ is a self-contained set of user code, packaged as a
|
|||
performs one step in the pipeline. For example, a component can be responsible
|
||||
for data preprocessing, data transformation, model training, and so on.
|
||||
|
||||
See the conceptual guides to [pipelines](/docs/components/pipelines/concepts/pipeline/)
|
||||
See the conceptual guides to [pipelines](/docs/components/pipelines/overview/concepts/pipeline/)
|
||||
and [components](/docs/components/pipelines/concepts/component/).
|
||||
|
||||
## Example of a pipeline
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ weight = 140
|
|||
|
||||
+++
|
||||
|
||||
You can use the [KFP-Tekton SDK](https://github.com/kubeflow/kfp-tekton/sdk)
|
||||
You can use the [KFP-Tekton SDK](https://github.com/kubeflow/kfp-tekton/tree/master/sdk)
|
||||
to compile, upload and run your Kubeflow Pipeline DSL Python scripts on a
|
||||
[Kubeflow Pipelines with Tekton backend](https://github.com/kubeflow/kfp-tekton/tree/master/tekton_kfp_guide.md).
|
||||
|
||||
|
|
|
|||
|
|
@ -287,14 +287,55 @@
|
|||
" storage service. Kubeflow Pipelines passes parameters to your component by\n",
|
||||
" file, by passing their paths as a command-line argument.\n",
|
||||
"\n",
|
||||
"<a name=\"parameter-names\"></a>\n",
|
||||
"#### Input and output parameter names\n",
|
||||
"\n",
|
||||
"When you use the Kubeflow Pipelines SDK to convert your Python function to a\n",
|
||||
"pipeline component, the Kubeflow Pipelines SDK uses the function's interface\n",
|
||||
"to define the interface of your component in the following ways.\n",
|
||||
"\n",
|
||||
"* Some arguments define input parameters.\n",
|
||||
"* Some arguments define output parameters.\n",
|
||||
"* The function's return value is used as an output parameter. If the return\n",
|
||||
" value is a [`collections.namedtuple`][named-tuple], the named tuple is used\n",
|
||||
" to return several small values. \n",
|
||||
"\n",
|
||||
"Since you can pass parameters between components as a value or as a path, the\n",
|
||||
"Kubeflow Pipelines SDK removes common parameter suffixes that leak the\n",
|
||||
"component's expected implementation. For example, a Python function-based\n",
|
||||
"component that ingests data and outputs CSV data may have an output argument\n",
|
||||
"that is defined as `csv_path: comp.OutputPath(str)`. In this case, the output\n",
|
||||
"is the CSV data, not the path. So, the Kubeflow Pipelines SDK simplifies the\n",
|
||||
"output name to `csv`.\n",
|
||||
"\n",
|
||||
"The Kubeflow Pipelines SDK uses the following rules to define the input and\n",
|
||||
"output parameter names in your component's interface:\n",
|
||||
"\n",
|
||||
"* If the argument name ends with `_path` and the argument is annotated as an\n",
|
||||
" [`kfp.components.InputPath`][input-path] or\n",
|
||||
" [`kfp.components.OutputPath`][output-path], the parameter name is the\n",
|
||||
" argument name with the trailing `_path` removed.\n",
|
||||
"* If the argument name ends with `_file`, the parameter name is the argument\n",
|
||||
" name with the trailing `_file` removed.\n",
|
||||
"* If you return a single small value from your component using the `return`\n",
|
||||
" statement, the output parameter is named `output`.\n",
|
||||
"* If you return several small values from your component by returning a \n",
|
||||
" [`collections.namedtuple`][named-tuple], the Kubeflow Pipelines SDK uses\n",
|
||||
" the tuple's field names as the output parameter names. \n",
|
||||
"\n",
|
||||
"Otherwise, the Kubeflow Pipelines SDK uses the argument name as the parameter\n",
|
||||
"name.\n",
|
||||
"\n",
|
||||
"<a name=\"pass-by-value\"></a>\n",
|
||||
"#### Passing parameters by value\n",
|
||||
"\n",
|
||||
"Python function-based components make it easier to pass parameters between\n",
|
||||
"components by value (such as numbers, booleans, and short strings), by letting\n",
|
||||
"you define your component’s interface by annotating your Python function. The\n",
|
||||
"supported types are `int`, `float`, `bool`, and `string`. If you do not\n",
|
||||
"annotate your function, these input parameters are passed as strings.\n",
|
||||
"supported types are `int`, `float`, `bool`, and `str`. You can also pass \n",
|
||||
"`list` or `dict` instances by value, if they contain small values, such as\n",
|
||||
"`int`, `float`, `bool`, or `str` values. If you do not annotate your function,\n",
|
||||
"these input parameters are passed as strings.\n",
|
||||
"\n",
|
||||
"If your component returns multiple outputs by value, annotate your function\n",
|
||||
"with the [`typing.NamedTuple`][named-tuple-hint] type hint and use the\n",
|
||||
|
|
@ -320,7 +361,9 @@
|
|||
"[named-tuple-hint]: https://docs.python.org/3/library/typing.html#typing.NamedTuple\n",
|
||||
"[named-tuple]: https://docs.python.org/3/library/collections.html#collections.namedtuple\n",
|
||||
"[kfp-visualize]: https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/\n",
|
||||
"[kfp-metrics]: https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/"
|
||||
"[kfp-metrics]: https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/\n",
|
||||
"[input-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.components.html#kfp.components.InputPath\n",
|
||||
"[output-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.components.html#kfp.components.OutputPath"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ weight = 50
|
|||
+++
|
||||
|
||||
<!--
|
||||
AUTOGENERATED FROM content/en/docs/components/pipelines/sdk/python-function-components.ipynb
|
||||
AUTOGENERATED FROM content/en/docs/pipelines/sdk/python-function-components.ipynb
|
||||
PLEASE UPDATE THE JUPYTER NOTEBOOK AND REGENERATE THIS FILE USING scripts/nb_to_md.py.-->
|
||||
|
||||
<style>
|
||||
|
|
@ -26,8 +26,8 @@ background-position: left center;
|
|||
}
|
||||
</style>
|
||||
<div class="notebook-links">
|
||||
<a class="colab-link" href="https://colab.research.google.com/github/kubeflow/website/blob/master/content/en/docs/components/pipelines/sdk/python-function-components.ipynb">Run in Google Colab</a>
|
||||
<a class="github-link" href="https://github.com/kubeflow/website/blob/master/content/en/docs/components/pipelines/sdk/python-function-components.ipynb">View source on GitHub</a>
|
||||
<a class="colab-link" href="https://colab.research.google.com/github/kubeflow/website/blob/master/content/en/docs/pipelines/sdk/python-function-components.ipynb">Run in Google Colab</a>
|
||||
<a class="github-link" href="https://github.com/kubeflow/website/blob/master/content/en/docs/pipelines/sdk/python-function-components.ipynb">View source on GitHub</a>
|
||||
</div>
|
||||
|
||||
|
||||
|
|
@ -257,14 +257,55 @@ The following sections describe how to pass parameters by value and by file.
|
|||
storage service. Kubeflow Pipelines passes parameters to your component by
|
||||
file, by passing their paths as a command-line argument.
|
||||
|
||||
<a name="parameter-names"></a>
|
||||
#### Input and output parameter names
|
||||
|
||||
When you use the Kubeflow Pipelines SDK to convert your Python function to a
|
||||
pipeline component, the Kubeflow Pipelines SDK uses the function's interface
|
||||
to define the interface of your component in the following ways.
|
||||
|
||||
* Some arguments define input parameters.
|
||||
* Some arguments define output parameters.
|
||||
* The function's return value is used as an output parameter. If the return
|
||||
value is a [`collections.namedtuple`][named-tuple], the named tuple is used
|
||||
to return several small values.
|
||||
|
||||
Since you can pass parameters between components as a value or as a path, the
|
||||
Kubeflow Pipelines SDK removes common parameter suffixes that leak the
|
||||
component's expected implementation. For example, a Python function-based
|
||||
component that ingests data and outputs CSV data may have an output argument
|
||||
that is defined as `csv_path: comp.OutputPath(str)`. In this case, the output
|
||||
is the CSV data, not the path. So, the Kubeflow Pipelines SDK simplifies the
|
||||
output name to `csv`.
|
||||
|
||||
The Kubeflow Pipelines SDK uses the following rules to define the input and
|
||||
output parameter names in your component's interface:
|
||||
|
||||
* If the argument name ends with `_path` and the argument is annotated as an
|
||||
[`kfp.components.InputPath`][input-path] or
|
||||
[`kfp.components.OutputPath`][output-path], the parameter name is the
|
||||
argument name with the trailing `_path` removed.
|
||||
* If the argument name ends with `_file`, the parameter name is the argument
|
||||
name with the trailing `_file` removed.
|
||||
* If you return a single small value from your component using the `return`
|
||||
statement, the output parameter is named `output`.
|
||||
* If you return several small values from your component by returning a
|
||||
[`collections.namedtuple`][named-tuple], the Kubeflow Pipelines SDK uses
|
||||
the tuple's field names as the output parameter names.
|
||||
|
||||
Otherwise, the Kubeflow Pipelines SDK uses the argument name as the parameter
|
||||
name.
|
||||
|
||||
<a name="pass-by-value"></a>
|
||||
#### Passing parameters by value
|
||||
|
||||
Python function-based components make it easier to pass parameters between
|
||||
components by value (such as numbers, booleans, and short strings), by letting
|
||||
you define your component’s interface by annotating your Python function. The
|
||||
supported types are `int`, `float`, `bool`, and `string`. If you do not
|
||||
annotate your function, these input parameters are passed as strings.
|
||||
supported types are `int`, `float`, `bool`, and `str`. You can also pass
|
||||
`list` or `dict` instances by value, if they contain small values, such as
|
||||
`int`, `float`, `bool`, or `str` values. If you do not annotate your function,
|
||||
these input parameters are passed as strings.
|
||||
|
||||
If your component returns multiple outputs by value, annotate your function
|
||||
with the [`typing.NamedTuple`][named-tuple-hint] type hint and use the
|
||||
|
|
@ -291,6 +332,8 @@ including component metadata and metrics.
|
|||
[named-tuple]: https://docs.python.org/3/library/collections.html#collections.namedtuple
|
||||
[kfp-visualize]: https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/
|
||||
[kfp-metrics]: https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/
|
||||
[input-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.components.html#kfp.components.InputPath
|
||||
[output-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.components.html#kfp.components.OutputPath
|
||||
|
||||
|
||||
```python
|
||||
|
|
@ -540,6 +583,6 @@ client.create_run_from_pipeline_func(calc_pipeline, arguments=arguments)
|
|||
|
||||
|
||||
<div class="notebook-links">
|
||||
<a class="colab-link" href="https://colab.research.google.com/github/kubeflow/website/blob/master/content/en/docs/components/pipelines/sdk/python-function-components.ipynb">Run in Google Colab</a>
|
||||
<a class="github-link" href="https://github.com/kubeflow/website/blob/master/content/en/docs/components/pipelines/sdk/python-function-components.ipynb">View source on GitHub</a>
|
||||
<a class="colab-link" href="https://colab.research.google.com/github/kubeflow/website/blob/master/content/en/docs/pipelines/sdk/python-function-components.ipynb">Run in Google Colab</a>
|
||||
<a class="github-link" href="https://github.com/kubeflow/website/blob/master/content/en/docs/pipelines/sdk/python-function-components.ipynb">View source on GitHub</a>
|
||||
</div>
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
+++
|
||||
title = "Frameworks for Training"
|
||||
description = "Training of ML models in Kubeflow"
|
||||
title = "Training Operators"
|
||||
description = "Training of ML models in Kubeflow through operators"
|
||||
weight = 70
|
||||
+++
|
||||
|
|
|
|||
|
|
@ -1,19 +0,0 @@
|
|||
+++
|
||||
title = "Chainer Training"
|
||||
description = "See Kubeflow [v0.6 docs](https://v0-6.kubeflow.org/docs/components/training/chainer/) for instructions on using Chainer for training"
|
||||
weight = 4
|
||||
toc = true
|
||||
|
||||
+++
|
||||
{{% alert title="Out of date" color="warning" %}}
|
||||
This guide contains outdated information pertaining to Kubeflow 1.0. This guide
|
||||
needs to be updated for Kubeflow 1.1.
|
||||
{{% /alert %}}
|
||||
|
||||
{{% alpha-status
|
||||
feedbacklink="https://github.com/kubeflow/chainer-operator/issues" %}}
|
||||
|
||||
[Chainer](https://github.com/kubeflow/chainer-operator) is not supported in
|
||||
Kubeflow versions greater than v0.6. See the [Kubeflow v0.6
|
||||
documentation](https://v0-6.kubeflow.org/docs/components/training/chainer/)
|
||||
for earlier support for Chainer training.
|
||||
|
|
@ -4,10 +4,6 @@ description = "Instructions for using MPI for training"
|
|||
weight = 25
|
||||
|
||||
+++
|
||||
{{% alert title="Out of date" color="warning" %}}
|
||||
This guide contains outdated information pertaining to Kubeflow 1.0. This guide
|
||||
needs to be updated for Kubeflow 1.1.
|
||||
{{% /alert %}}
|
||||
|
||||
{{% alpha-status
|
||||
feedbacklink="https://github.com/kubeflow/mpi-operator/issues" %}}
|
||||
|
|
@ -26,7 +22,7 @@ cd mpi-operator
|
|||
kubectl create -f deploy/v1alpha2/mpi-operator.yaml
|
||||
```
|
||||
|
||||
Alternatively, follow the [getting started guide](/docs/started/getting-started/) to deploy Kubeflow.
|
||||
Alternatively, follow the [getting started guide](https://www.kubeflow.org/docs/started/getting-started/) to deploy Kubeflow.
|
||||
|
||||
An alpha version of MPI support was introduced with Kubeflow 0.2.0. You must be using a version of Kubeflow newer than 0.2.0.
|
||||
|
||||
|
|
@ -48,9 +44,9 @@ mpijobs.kubeflow.org 4d
|
|||
If it is not included you can add it as follows using [kustomize](https://github.com/kubernetes-sigs/kustomize):
|
||||
|
||||
```bash
|
||||
git clone https://github.com/kubeflow/manifests
|
||||
cd manifests/mpi-job/mpi-operator
|
||||
kustomize build base | kubectl apply -f -
|
||||
git clone https://github.com/kubeflow/mpi-operator
|
||||
cd mpi-operator/manifests
|
||||
kustomize build overlays/kubeflow | kubectl apply -f -
|
||||
```
|
||||
|
||||
Note that since Kubernetes v1.14, `kustomize` became a subcommand in `kubectl` so you can also run the following command instead:
|
||||
|
|
@ -66,6 +62,7 @@ You can create an MPI job by defining an `MPIJob` config file. See [TensorFlow b
|
|||
```
|
||||
cat examples/v1alpha2/tensorflow-benchmarks.yaml
|
||||
```
|
||||
|
||||
Deploy the `MPIJob` resource to start training:
|
||||
|
||||
```
|
||||
|
|
@ -166,7 +163,6 @@ status:
|
|||
startTime: "2019-07-09T22:15:51Z"
|
||||
```
|
||||
|
||||
|
||||
Training should run for 100 steps and takes a few minutes on a GPU cluster. You can inspect the logs to see the training progress. When the job starts, access the logs from the `launcher` pod:
|
||||
|
||||
```
|
||||
|
|
@ -192,20 +188,20 @@ Variables: horovod
|
|||
|
||||
...
|
||||
|
||||
40 images/sec: 154.4 +/- 0.7 (jitter = 4.0) 8.280
|
||||
40 images/sec: 154.4 +/- 0.7 (jitter = 4.1) 8.482
|
||||
50 images/sec: 154.8 +/- 0.6 (jitter = 4.0) 8.397
|
||||
50 images/sec: 154.8 +/- 0.6 (jitter = 4.2) 8.450
|
||||
60 images/sec: 154.5 +/- 0.5 (jitter = 4.1) 8.321
|
||||
60 images/sec: 154.5 +/- 0.5 (jitter = 4.4) 8.349
|
||||
70 images/sec: 154.5 +/- 0.5 (jitter = 4.0) 8.433
|
||||
70 images/sec: 154.5 +/- 0.5 (jitter = 4.4) 8.430
|
||||
80 images/sec: 154.8 +/- 0.4 (jitter = 3.6) 8.199
|
||||
80 images/sec: 154.8 +/- 0.4 (jitter = 3.8) 8.404
|
||||
90 images/sec: 154.6 +/- 0.4 (jitter = 3.7) 8.418
|
||||
90 images/sec: 154.6 +/- 0.4 (jitter = 3.6) 8.459
|
||||
100 images/sec: 154.2 +/- 0.4 (jitter = 4.0) 8.372
|
||||
100 images/sec: 154.2 +/- 0.4 (jitter = 4.0) 8.542
|
||||
40 images/sec: 154.4 +/- 0.7 (jitter = 4.0) 8.280
|
||||
40 images/sec: 154.4 +/- 0.7 (jitter = 4.1) 8.482
|
||||
50 images/sec: 154.8 +/- 0.6 (jitter = 4.0) 8.397
|
||||
50 images/sec: 154.8 +/- 0.6 (jitter = 4.2) 8.450
|
||||
60 images/sec: 154.5 +/- 0.5 (jitter = 4.1) 8.321
|
||||
60 images/sec: 154.5 +/- 0.5 (jitter = 4.4) 8.349
|
||||
70 images/sec: 154.5 +/- 0.5 (jitter = 4.0) 8.433
|
||||
70 images/sec: 154.5 +/- 0.5 (jitter = 4.4) 8.430
|
||||
80 images/sec: 154.8 +/- 0.4 (jitter = 3.6) 8.199
|
||||
80 images/sec: 154.8 +/- 0.4 (jitter = 3.8) 8.404
|
||||
90 images/sec: 154.6 +/- 0.4 (jitter = 3.7) 8.418
|
||||
90 images/sec: 154.6 +/- 0.4 (jitter = 3.6) 8.459
|
||||
100 images/sec: 154.2 +/- 0.4 (jitter = 4.0) 8.372
|
||||
100 images/sec: 154.2 +/- 0.4 (jitter = 4.0) 8.542
|
||||
----------------------------------------------------------------
|
||||
total images/sec: 308.27
|
||||
```
|
||||
|
|
@ -214,5 +210,5 @@ total images/sec: 308.27
|
|||
|
||||
Docker images are built and pushed automatically to [mpioperator on Dockerhub](https://hub.docker.com/u/mpioperator). You can use the following Dockerfiles to build the images yourself:
|
||||
|
||||
* [mpi-operator](https://github.com/kubeflow/mpi-operator/blob/master/Dockerfile)
|
||||
* [kubectl-delivery](https://github.com/kubeflow/mpi-operator/blob/master/cmd/kubectl-delivery/Dockerfile)
|
||||
- [mpi-operator](https://github.com/kubeflow/mpi-operator/blob/master/Dockerfile)
|
||||
- [kubectl-delivery](https://github.com/kubeflow/mpi-operator/blob/master/cmd/kubectl-delivery/Dockerfile)
|
||||
|
|
|
|||
|
|
@ -4,31 +4,34 @@ description = "Instructions for using MXNet"
|
|||
weight = 25
|
||||
|
||||
+++
|
||||
{{% alert title="Out of date" color="warning" %}}
|
||||
This guide contains outdated information pertaining to Kubeflow 1.0. This guide
|
||||
needs to be updated for Kubeflow 1.1.
|
||||
{{% /alert %}}
|
||||
|
||||
{{% alpha-status
|
||||
feedbacklink="https://github.com/kubeflow/mxnet-operator/issues" %}}
|
||||
|
||||
This guide walks you through using MXNet with Kubeflow.
|
||||
This guide walks you through using [Apache MXNet (incubating)](https://github.com/apache/incubator-mxnet) with Kubeflow.
|
||||
|
||||
## Installing MXNet Operator
|
||||
MXNet Operator provides a Kubernetes custom resource `MXJob` that makes it easy to run distributed or non-distributed
|
||||
Apache MXNet jobs (training and tuning) and other extended framework like [BytePS](https://github.com/bytedance/byteps)
|
||||
jobs on Kubernetes. Using a Custom Resource Definition (CRD) gives users the ability to create
|
||||
and manage Apache MXNet jobs just like built-in K8S resources.
|
||||
|
||||
If you haven't already done so please follow the [Getting Started Guide](https://www.kubeflow.org/docs/started/getting-started/) to deploy Kubeflow.
|
||||
## Installing the MXJob CRD and operator on your k8s cluster
|
||||
|
||||
A version of MXNet support was introduced with Kubeflow 0.2.0. You must be using a version of Kubeflow newer than 0.2.0.
|
||||
### Deploy MXJob CRD and Apache MXNet Operator
|
||||
|
||||
## Verify that MXNet support is included in your Kubeflow deployment
|
||||
```
|
||||
kustomize build manifests/overlays/v1 | kubectl apply -f -
|
||||
```
|
||||
|
||||
Check that the MXNet custom resource is installed
|
||||
### Verify that MXJob CRD and Apache MXNet Operator are installed
|
||||
|
||||
Check that the Apache MXNet custom resource is installed via:
|
||||
|
||||
```
|
||||
kubectl get crd
|
||||
```
|
||||
|
||||
The output should include `mxjobs.kubeflow.org`
|
||||
The output should include `mxjobs.kubeflow.org` like the following:
|
||||
|
||||
```
|
||||
NAME AGE
|
||||
|
|
@ -37,72 +40,119 @@ mxjobs.kubeflow.org 4d
|
|||
...
|
||||
```
|
||||
|
||||
If it is not included you can add it as follows
|
||||
Check that the Apache MXNet operator is running via:
|
||||
|
||||
```
|
||||
git clone https://github.com/kubeflow/manifests
|
||||
cd manifests/mxnet-job/mxnet-operator
|
||||
kubectl kustomize base | kubectl apply -f -
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
Alternatively, you can deploy the operator with default settings without using kustomize by running the following from the repo:
|
||||
The output should include `mxnet-operaror-xxx` like the following:
|
||||
|
||||
```
|
||||
git clone https://github.com/kubeflow/mxnet-operator.git
|
||||
cd mxnet-operator
|
||||
kubectl create -f manifests/crd-v1beta1.yaml
|
||||
kubectl create -f manifests/rbac.yaml
|
||||
kubectl create -f manifests/deployment.yaml
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
mxnet-operator-d466b46bc-xbqvs 1/1 Running 0 4m37s
|
||||
```
|
||||
|
||||
## Creating a MXNet training job
|
||||
|
||||
|
||||
You create a training job by defining a MXJob with MXTrain mode and then creating it with
|
||||
### Creating a Apache MXNet training job
|
||||
|
||||
You create a training job by defining a `MXJob` with `MXTrain` mode and then creating it with.
|
||||
|
||||
```
|
||||
kubectl create -f examples/v1beta1/train/mx_job_dist_gpu.yaml
|
||||
kubectl create -f examples/train/mx_job_dist_gpu_v1.yaml
|
||||
```
|
||||
|
||||
Each `replicaSpec` defines a set of Apache MXNet processes.
|
||||
The `mxReplicaType` defines the semantics for the set of processes.
|
||||
The semantics are as follows:
|
||||
|
||||
## Creating a TVM tuning job (AutoTVM)
|
||||
**scheduler**
|
||||
* A job must have 1 and only 1 scheduler
|
||||
* The pod must contain a container named mxnet
|
||||
* The overall status of the `MXJob` is determined by the exit code of the
|
||||
mxnet container
|
||||
* 0 = success
|
||||
* 1 || 2 || 126 || 127 || 128 || 139 = permanent errors:
|
||||
* 1: general errors
|
||||
* 2: misuse of shell builtins
|
||||
* 126: command invoked cannot execute
|
||||
* 127: command not found
|
||||
* 128: invalid argument to exit
|
||||
* 139: container terminated by SIGSEGV(Invalid memory reference)
|
||||
* 130 || 137 || 143 = retryable error for unexpected system signals:
|
||||
* 130: container terminated by Control-C
|
||||
* 137: container received a SIGKILL
|
||||
* 143: container received a SIGTERM
|
||||
* 138 = reserved in tf-operator for user specified retryable errors
|
||||
* others = undefined and no guarantee
|
||||
|
||||
**worker**
|
||||
* A job can have 0 to N workers
|
||||
* The pod must contain a container named mxnet
|
||||
* Workers are automatically restarted if they exit
|
||||
|
||||
**server**
|
||||
* A job can have 0 to N servers
|
||||
* parameter servers are automatically restarted if they exit
|
||||
|
||||
|
||||
[TVM](https://docs.tvm.ai/tutorials/) is a end to end deep learning compiler stack, you can easily run AutoTVM with mxnet-operator.
|
||||
For each replica you define a **template** which is a K8S
|
||||
[PodTemplateSpec](https://kubernetes.io/docs/api-reference/v1.8/#podtemplatespec-v1-core).
|
||||
The template allows you to specify the containers, volumes, etc... that
|
||||
should be created for each replica.
|
||||
|
||||
### Creating a TVM tuning job (AutoTVM)
|
||||
|
||||
[TVM](https://docs.tvm.ai/tutorials/) is a end to end deep learning compiler stack, you can easily run AutoTVM with mxnet-operator.
|
||||
You can create a auto tuning job by define a type of MXTune job and then creating it with
|
||||
|
||||
|
||||
```
|
||||
kubectl create -f examples/v1beta1/tune/mx_job_tune_gpu.yaml
|
||||
kubectl create -f examples/tune/mx_job_tune_gpu_v1.yaml
|
||||
```
|
||||
|
||||
Before you use the auto-tuning example, there is some preparatory work need to be finished in advance.
|
||||
To let TVM tune your network, you should create a docker image which has TVM module.
|
||||
Then, you need a auto-tuning script to specify which network will be tuned and set the auto-tuning parameters.
|
||||
For more details, please see [tutorials](https://docs.tvm.ai/tutorials/autotvm/tune_relay_mobile_gpu.html#sphx-glr-tutorials-autotvm-tune-relay-mobile-gpu-py).
|
||||
Finally, you need a startup script to start the auto-tuning program. In fact, mxnet-operator will set all the parameters as environment variables and the startup script need to reed these variable and then transmit them to auto-tuning script.
|
||||
We provide an example under `examples/tune/`, tuning result will be saved in a log file like resnet-18.log in the example we gave. You can refer it for details.
|
||||
|
||||
Before you use the auto-tuning example, there is some preparatory work need to be finished in advance. To let TVM tune your network, you should create a docker image which has TVM module. Then, you need a auto-tuning script to specify which network will be tuned and set the auto-tuning parameters, For more details, please see https://docs.tvm.ai/tutorials/autotvm/tune_relay_mobile_gpu.html#sphx-glr-tutorials-autotvm-tune-relay-mobile-gpu-py. Finally, you need a startup script to start the auto-tuning program. In fact, mxnet-operator will set all the parameters as environment variables and the startup script need to reed these variable and then transmit them to auto-tuning script. We provide an example under examples/v1beta1/tune/, tuning result will be saved in a log file like resnet-18.log in the example we gave. You can refer it for details.
|
||||
### Using GPUs
|
||||
|
||||
MXNet Operator supports training with GPUs.
|
||||
|
||||
## Monitoring a MXNet Job
|
||||
Please verify your image is available for distributed training with GPUs.
|
||||
|
||||
For example, if you have the following, MXNet Operator will arrange the pod to nodes to satisfy the GPU limit.
|
||||
|
||||
```
|
||||
command: ["python"]
|
||||
args: ["/incubator-mxnet/example/image-classification/train_mnist.py","--num-epochs","1","--num-layers","2","--kv-store","dist_device_sync","--gpus","0"]
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
```
|
||||
|
||||
### Monitoring your Apache MXNet job
|
||||
|
||||
To get the status of your job
|
||||
|
||||
```bash
|
||||
kubectl get -o yaml mxjobs ${JOB}
|
||||
```
|
||||
kubectl get -o yaml mxjobs $JOB
|
||||
```
|
||||
|
||||
Here is sample output for an example job
|
||||
|
||||
```yaml
|
||||
apiVersion: kubeflow.org/v1beta1
|
||||
apiVersion: kubeflow.org/v1
|
||||
kind: MXJob
|
||||
metadata:
|
||||
creationTimestamp: 2019-03-19T09:24:27Z
|
||||
creationTimestamp: 2021-03-24T15:37:27Z
|
||||
generation: 1
|
||||
name: mxnet-job
|
||||
namespace: default
|
||||
resourceVersion: "3681685"
|
||||
selfLink: /apis/kubeflow.org/v1beta1/namespaces/default/mxjobs/mxnet-job
|
||||
uid: cb11013b-4a28-11e9-b7f4-704d7bb59f71
|
||||
resourceVersion: "5123435"
|
||||
selfLink: /apis/kubeflow.org/v1/namespaces/default/mxjobs/mxnet-job
|
||||
uid: xx11013b-4a28-11e9-s5a1-704d7bb912f91
|
||||
spec:
|
||||
cleanPodPolicy: All
|
||||
jobMode: MXTrain
|
||||
|
|
@ -164,22 +214,22 @@ spec:
|
|||
limits:
|
||||
nvidia.com/gpu: "1"
|
||||
status:
|
||||
completionTime: 2019-03-19T09:25:11Z
|
||||
completionTime: 2021-03-24T09:25:11Z
|
||||
conditions:
|
||||
- lastTransitionTime: 2019-03-19T09:24:27Z
|
||||
lastUpdateTime: 2019-03-19T09:24:27Z
|
||||
- lastTransitionTime: 2021-03-24T15:37:27Z
|
||||
lastUpdateTime: 2021-03-24T15:37:27Z
|
||||
message: MXJob mxnet-job is created.
|
||||
reason: MXJobCreated
|
||||
status: "True"
|
||||
type: Created
|
||||
- lastTransitionTime: 2019-03-19T09:24:27Z
|
||||
lastUpdateTime: 2019-03-19T09:24:29Z
|
||||
- lastTransitionTime: 2021-03-24T15:37:27Z
|
||||
lastUpdateTime: 2021-03-24T15:37:29Z
|
||||
message: MXJob mxnet-job is running.
|
||||
reason: MXJobRunning
|
||||
status: "False"
|
||||
type: Running
|
||||
- lastTransitionTime: 2019-03-19T09:24:27Z
|
||||
lastUpdateTime: 2019-03-19T09:25:11Z
|
||||
- lastTransitionTime: 2021-03-24T15:37:27Z
|
||||
lastUpdateTime: 2021-03-24T09:25:11Z
|
||||
message: MXJob mxnet-job is successfully completed.
|
||||
reason: MXJobSucceeded
|
||||
status: "True"
|
||||
|
|
@ -188,5 +238,51 @@ status:
|
|||
Scheduler: {}
|
||||
Server: {}
|
||||
Worker: {}
|
||||
startTime: 2019-03-19T09:24:29Z
|
||||
startTime: 2021-03-24T15:37:29Z
|
||||
```
|
||||
|
||||
The first thing to note is the **RuntimeId**. This is a random unique
|
||||
string which is used to give names to all the K8s resouces
|
||||
(e.g Job controllers & services) that are created by the `MXJob`.
|
||||
|
||||
As with other K8S resources status provides information about the state
|
||||
of the resource.
|
||||
|
||||
**phase** - Indicates the phase of a job and will be one of
|
||||
- Creating
|
||||
- Running
|
||||
- CleanUp
|
||||
- Failed
|
||||
- Done
|
||||
|
||||
**state** - Provides the overall status of the job and will be one of
|
||||
- Running
|
||||
- Succeeded
|
||||
- Failed
|
||||
|
||||
For each replica type in the job, there will be a `ReplicaStatus` that
|
||||
provides the number of replicas of that type in each state.
|
||||
|
||||
For each replica type, the job creates a set of K8s
|
||||
[Job Controllers](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/)
|
||||
named
|
||||
|
||||
```
|
||||
${REPLICA-TYPE}-${RUNTIME_ID}-${INDEX}
|
||||
```
|
||||
|
||||
For example, if you have 2 servers and the runtime id is "76n0", then `MXJob`
|
||||
will create the following two jobs:
|
||||
|
||||
```
|
||||
server-76no-0
|
||||
server-76no-1
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Please refer to the [this document](./CONTRIBUTING.md) for contributing guidelines.
|
||||
|
||||
## Community
|
||||
|
||||
Please check out [Kubeflow community page](https://www.kubeflow.org/docs/about/community/) for more information on how to get involved in our community.
|
||||
|
|
|
|||
|
|
@ -1,13 +1,9 @@
|
|||
+++
|
||||
title = "PyTorch Training"
|
||||
description = "Instructions for using PyTorch"
|
||||
weight = 35
|
||||
weight = 15
|
||||
|
||||
+++
|
||||
{{% alert title="Out of date" color="warning" %}}
|
||||
This guide contains outdated information pertaining to Kubeflow 1.0. This guide
|
||||
needs to be updated for Kubeflow 1.1.
|
||||
{{% /alert %}}
|
||||
|
||||
{{% stable-status %}}
|
||||
|
||||
|
|
|
|||
|
|
@ -2,13 +2,9 @@
|
|||
title = "TensorFlow Training (TFJob)"
|
||||
linkTitle = "TensorFlow Training (TFJob)"
|
||||
description = "Using TFJob to train a model with TensorFlow"
|
||||
weight = 60
|
||||
weight = 10
|
||||
|
||||
+++
|
||||
{{% alert title="Out of date" color="warning" %}}
|
||||
This guide contains outdated information pertaining to Kubeflow 1.0. This guide
|
||||
needs to be updated for Kubeflow 1.1.
|
||||
{{% /alert %}}
|
||||
|
||||
{{% stable-status %}}
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,6 @@
|
|||
approvers:
|
||||
- Bobgy
|
||||
- RFMVasconcelos
|
||||
|
||||
reviewers:
|
||||
- 8bitmp3
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
+++
|
||||
title = "Distributions"
|
||||
description = "A list of available Kubeflow distributions"
|
||||
weight = 40
|
||||
+++
|
||||
|
|
@ -1,5 +1,5 @@
|
|||
+++
|
||||
title = "Kubeflow on AWS"
|
||||
description = "Running Kubeflow on Kubernetes Engine and Amazon Web Services"
|
||||
weight = 50
|
||||
weight = 20
|
||||
+++
|
||||
|
|
@ -1,5 +1,5 @@
|
|||
+++
|
||||
title = "Kubeflow on Azure"
|
||||
description = "Running Kubeflow on Kubernetes Engine and Microsoft Azure"
|
||||
weight = 50
|
||||
weight = 20
|
||||
+++
|
||||
|
|
@ -184,4 +184,4 @@ Run the following commands to set up and deploy Kubeflow.
|
|||
|
||||
## Additional information
|
||||
|
||||
You can find general information about Kubeflow configuration in the guide to [configuring Kubeflow with kfctl and kustomize](/docs/other-guides/kustomize/).
|
||||
You can find general information about Kubeflow configuration in the guide to [configuring Kubeflow with kfctl and kustomize](/docs/methods/kfctl/kustomize/).
|
||||
|
Before Width: | Height: | Size: 239 KiB After Width: | Height: | Size: 239 KiB |
|
Before Width: | Height: | Size: 206 KiB After Width: | Height: | Size: 206 KiB |
|
Before Width: | Height: | Size: 82 KiB After Width: | Height: | Size: 82 KiB |
|
Before Width: | Height: | Size: 69 KiB After Width: | Height: | Size: 69 KiB |
|
Before Width: | Height: | Size: 33 KiB After Width: | Height: | Size: 33 KiB |
|
Before Width: | Height: | Size: 116 KiB After Width: | Height: | Size: 116 KiB |
|
Before Width: | Height: | Size: 175 KiB After Width: | Height: | Size: 175 KiB |
|
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 66 KiB |
|
Before Width: | Height: | Size: 191 KiB After Width: | Height: | Size: 191 KiB |
|
Before Width: | Height: | Size: 266 KiB After Width: | Height: | Size: 266 KiB |
|
Before Width: | Height: | Size: 230 KiB After Width: | Height: | Size: 230 KiB |
|
|
@ -0,0 +1,5 @@
|
|||
approvers:
|
||||
- RFMVasconcelos
|
||||
- knkski
|
||||
reviewers:
|
||||
- DomFleischmann
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
+++
|
||||
title = "Kubeflow Charmed Operators"
|
||||
description = "Charmed Operators for Kubeflow deployment and day-2 operations"
|
||||
weight = 50
|
||||
+++
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
+++
|
||||
title = "Installing Kubeflow with Charmed Operators"
|
||||
description = "Instructions for Kubeflow deployment with Kubeflow Charmed Operators"
|
||||
weight = 10
|
||||
+++
|
||||
|
||||
This guide outlines the steps you need to install and deploy Kubeflow with [Charmed Operators](https://charmed-kubeflow.io/docs) and [Juju](https://juju.is/docs/kubernetes) on any conformant Kubernetes, including [Azure Kubernetes Service (AKS)](https://docs.microsoft.com/en-us/azure/aks/), [Amazon Elastic Kubernetes Service (EKS)](https://docs.aws.amazon.com/eks/index.html), [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/), [OpenShift](https://docs.openshift.com), and any [kubeadm](https://kubernetes.io/docs/reference/setup-tools/kubeadm/)-deployed cluster (provided that you have access to it via `kubectl`).
|
||||
|
||||
#### 1. Install the Juju client
|
||||
|
||||
On Linux, install `juju` via [snap](https://snapcraft.io/docs/installing-snapd) with the following command:
|
||||
|
||||
```bash
|
||||
snap install juju --classic
|
||||
```
|
||||
|
||||
If you use macOS, you can use [Homebrew](https://brew.sh) and type `brew install juju` in the command line. For Windows, download the Windows [installer for Juju](https://launchpad.net/juju/2.8/2.8.5/+download/juju-setup-2.8.5-signed.exe).
|
||||
|
||||
#### 2. Connect Juju to your Kubernetes cluster
|
||||
|
||||
To operate workloads in your Kubernetes cluster with Juju, you have to add the cluster to the list of *clouds* in Juju via the `add-k8s` command.
|
||||
|
||||
If your Kubernetes config file is in the default location (such as `~/.kube/config` on Linux) and you only have one cluster, you can simply run:
|
||||
|
||||
```bash
|
||||
juju add-k8s myk8s
|
||||
```
|
||||
If your kubectl config file contains multiple clusters, you can specify the appropriate one by name:
|
||||
|
||||
```bash
|
||||
juju add-k8s myk8s --cluster-name=foo
|
||||
```
|
||||
Finally, to use a different config file, you can set the `KUBECONFIG` environment variable to point to the relevant file. For example:
|
||||
|
||||
```bash
|
||||
KUBECONFIG=path/to/file juju add-k8s myk8s
|
||||
```
|
||||
|
||||
For more details, go to the [official Juju documentation](https://juju.is/docs/clouds).
|
||||
|
||||
#### 3. Create a controller
|
||||
|
||||
To operate workloads on your Kubernetes cluster, Juju uses controllers. You can create a controller with the `bootstrap` command:
|
||||
|
||||
```bash
|
||||
juju bootstrap myk8s my-controller
|
||||
```
|
||||
|
||||
This command will create a couple of pods under the `my-controller` namespace. You can see your controllers with the `juju controllers` command.
|
||||
|
||||
You can read more about controllers in the [Juju documentation](https://juju.is/docs/creating-a-controller).
|
||||
|
||||
#### 4. Create a model
|
||||
|
||||
A model in Juju is a blank canvas where your operators will be deployed, and it holds a 1:1 relationship with a Kubernetes namespace.
|
||||
|
||||
You can create a model and give it a name, e.g. `kubeflow`, with the `add-model` command, and you will also be creating a Kubernetes namespace of the same name:
|
||||
|
||||
```bash
|
||||
juju add-model kubeflow
|
||||
```
|
||||
You can list your models with the `juju models` command.
|
||||
|
||||
#### 5. Deploy Kubeflow
|
||||
|
||||
[note type="caution" status="MIN RESOURCES"]
|
||||
To deploy `kubeflow`, you'll need at least 50Gb available of disk, 14Gb of RAM, and 2 CPUs available in your machine/VM.
|
||||
If you have fewer resources, deploy `kubeflow-lite` or `kubeflow-edge`.
|
||||
[/note]
|
||||
|
||||
Once you have a model, you can simply `juju deploy` any of the provided [Kubeflow bundles](https://charmed-kubeflow.io/docs/operators-and-bundles) into your cluster. For the _Kubeflow lite_ bundle, run:
|
||||
|
||||
```bash
|
||||
juju deploy kubeflow-lite
|
||||
```
|
||||
|
||||
and your Kubeflow installation should begin!
|
||||
|
||||
You can observe your Kubeflow deployment getting spun-up with the command:
|
||||
|
||||
```bash
|
||||
watch -c juju status --color
|
||||
```
|
||||
|
||||
#### 6. Add an RBAC role for Istio
|
||||
|
||||
At the time of writing this guide, to set up Kubeflow with [Istio](https://istio.io) correctly, you need to provide the `istio-ingressgateway` operator access to Kubernetes resources. Use the following command to create the appropriate role:
|
||||
|
||||
```bash
|
||||
kubectl patch role -n kubeflow istio-ingressgateway-operator -p '{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"Role","metadata":{"name":"istio-ingressgateway-operator"},"rules":[{"apiGroups":["*"],"resources":["*"],"verbs":["*"]}]}'
|
||||
```
|
||||
|
||||
#### 7. Set URL in authentication methods
|
||||
|
||||
Finally, you need to enable your Kubeflow dashboard access. Provide the dashboard's public URL to dex-auth and oidc-gatekeeper as follows:
|
||||
|
||||
```bash
|
||||
juju config dex-auth public-url=http://<URL>
|
||||
juju config oidc-gatekeeper public-url=http://<URL>
|
||||
```
|
||||
|
||||
where in place of `<URL>` you should use the hostname that the Kubeflow dashboard responds to.
|
||||
|
||||
#### More documentation
|
||||
|
||||
For more documentation, visit the [Charmed Kubeflow website](https://charmed-kubeflow.io/docs).
|
||||
|
||||
#### Having issues?
|
||||
|
||||
If you have any issues or questions, feel free to create a GitHub issue [here](https://github.com/canonical/bundle-kubeflow/issues).
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
approvers:
|
||||
- Bobgy
|
||||
- joeliedtke
|
||||
- rmgogogo
|
||||
- zijianjoy
|
||||
reviewers:
|
||||
- 8bitmp3
|
||||
- joeliedtke
|
||||
|
|
@ -1,5 +1,5 @@
|
|||
+++
|
||||
title = "Kubeflow on GCP"
|
||||
description = "Running Kubeflow on Kubernetes Engine and Google Cloud Platform"
|
||||
weight = 50
|
||||
weight = 20
|
||||
+++
|
||||
|
|
@ -4,10 +4,6 @@ description = "Running Kubeflow across on-premises and cloud environments with A
|
|||
weight = 12
|
||||
|
||||
+++
|
||||
{{% alert title="Out of date" color="warning" %}}
|
||||
This guide contains outdated information pertaining to Kubeflow 1.0. This guide
|
||||
needs to be updated for Kubeflow 1.1.
|
||||
{{% /alert %}}
|
||||
|
||||
[Anthos](https://cloud.google.com/anthos) is a hybrid and multi-cloud
|
||||
application platform developed and supported by Google. Anthos is built on
|
||||
|
|
@ -16,7 +12,7 @@ open source technologies, including Kubernetes, Istio, and Knative.
|
|||
Using Anthos, you can create a consistent setup across your on-premises and
|
||||
cloud environments, helping you to automate policy and security at scale.
|
||||
|
||||
Kubeflow on GKE On Prem is a work in progress. To track progress you can subscribe
|
||||
We are collecting interest for Kubeflow on GKE On Prem. You can subscribe
|
||||
to the GitHub issue [kubeflow/gcp-blueprints#138](https://github.com/kubeflow/gcp-blueprints/issues/138).
|
||||
|
||||
## Next steps
|
||||
|
|
@ -105,7 +105,7 @@ You can use [kustomize](https://kustomize.io/) to customize Kubeflow.
|
|||
Make sure that you have the minimum required version of kustomize:
|
||||
<b>{{% kustomize-min-version %}}</b> or later. For more information about
|
||||
kustomize in Kubeflow, see
|
||||
[how Kubeflow uses kustomize](/docs/other-guides/kustomize/).
|
||||
[how Kubeflow uses kustomize](/docs/methods/kfctl/kustomize/).
|
||||
|
||||
To customize the Kubernetes resources running within the cluster, you can modify
|
||||
the kustomize manifests in `${KF_DIR}/kustomize`.
|
||||
|
|
@ -78,14 +78,14 @@ purpose. No tools will assume they actually exists in your terminal environment.
|
|||
|
||||
1. Install [Kustomize](https://kubectl.docs.kubernetes.io/installation/kustomize/).
|
||||
|
||||
**Note:** Prior to Kubeflow v1.2, Kubeflow was compatible only with Kustomize `v3.2.1`. Starting from Kubeflow v1.2, you can now use the latest Kustomize versions to install Kubeflow.
|
||||
**Note:** Prior to Kubeflow v1.2, Kubeflow was compatible only with Kustomize `v3.2.1`. Starting from Kubeflow v1.2, you can now use any `v3` Kustomize version to install Kubeflow. Kustomize `v4` is not supported out of the box yet. [Official Version](https://github.com/kubeflow/manifests/tree/master#prerequisites)
|
||||
|
||||
To deploy the latest version of Kustomize on a Linux or Mac machine, run the following commands:
|
||||
|
||||
```bash
|
||||
# Detect your OS and download the corresponding latest Kustomize binary
|
||||
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
|
||||
|
||||
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" > install_kustomize.sh
|
||||
bash ./install_kustomize.sh 3.2.0
|
||||
# Add the kustomize package to your $PATH env variable
|
||||
sudo mv ./kustomize /usr/local/bin/kustomize
|
||||
```
|
||||
|
|
@ -124,7 +124,7 @@ It's time to get started!
|
|||
|
||||
[tensorflow]: https://www.tensorflow.org/
|
||||
[tf-train]: https://www.tensorflow.org/api_guides/python/train
|
||||
[tf-serving]: https://www.tensorflow.org/serving/
|
||||
[tf-serving]: https://www.tensorflow.org/tfx/guide/serving
|
||||
|
||||
[kubernetes]: https://kubernetes.io/
|
||||
[kubernetes-engine]: https://cloud.google.com/kubernetes-engine/
|
||||
|
|
@ -1,5 +1,5 @@
|
|||
+++
|
||||
title = "Kubeflow on IBM Cloud"
|
||||
description = "Running Kubeflow on IBM Cloud Kubernetes Service (IKS)"
|
||||
weight = 50
|
||||
weight = 20
|
||||
+++
|
||||
|
|
@ -0,0 +1,289 @@
|
|||
+++
|
||||
title = "Create or access an IBM Cloud Kubernetes cluster on a VPC"
|
||||
description = "Instructions for creating or connecting to a Kubernetes cluster on IBM Cloud vpc-gen2"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
## Create and setup a new cluster
|
||||
|
||||
Follow these steps to create and setup a new IBM Cloud Kubernetes Service(IKS) cluster on `vpc-gen2` provider.
|
||||
|
||||
A `vpc-gen2` cluster does not expose each node to the public internet directly and thus has more secure
|
||||
and more complex network setup. It is recommended setup for secured production use cases of Kubeflow.
|
||||
|
||||
### Setting environment variables
|
||||
|
||||
Choose the region and the worker node provider for your cluster, and set the environment variables.
|
||||
|
||||
```shell
|
||||
export KUBERNERTES_VERSION=1.18
|
||||
export CLUSTER_ZONE=us-south-3
|
||||
export CLUSTER_NAME=kubeflow-vpc
|
||||
```
|
||||
|
||||
where:
|
||||
|
||||
- `KUBERNETES_VERSION`: Run `ibmcloud ks versions` to see the supported Kubernetes versions. Refer to
|
||||
[Supported version matrix](https://www.kubeflow.org/docs/started/k8s/overview/#minimum-system-requirements).
|
||||
- `CLUSTER_ZONE`: Run `ibmcloud ks locations` to list supported zones. For example, choose `us-south-3` to create your
|
||||
cluster in the Dallas (US) data center.
|
||||
- `CLUSTER_NAME` must be lowercase and unique among any other Kubernetes
|
||||
clusters in the specified `${CLUSTER_ZONE}`.
|
||||
|
||||
**Notice**: Refer to [Creating clusters](https://cloud.ibm.com/docs/containers?topic=containers-clusters) in the IBM
|
||||
Cloud documentation for additional information on how to set up other providers and zones in your cluster.
|
||||
|
||||
### Choosing a worker node flavor
|
||||
|
||||
The worker nodes flavor name varies from zones and providers. Run
|
||||
`ibmcloud ks flavors --zone ${CLUSTER_ZONE} --provider vpc-gen2` to list available flavors.
|
||||
|
||||
Below are some examples of flavors supported in the `us-south-3` zone with `vpc-gen2` node provider:
|
||||
|
||||
```shell
|
||||
ibmcloud ks flavors --zone us-south-3 --provider vpc-gen2
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
For more information about these flavors, see 'https://ibm.biz/flavors'
|
||||
Name Cores Memory Network Speed OS Server Type Storage Secondary Storage Provider
|
||||
bx2.16x64 16 64GB 16Gbps UBUNTU_18_64 virtual 100GB 0B vpc-gen2
|
||||
bx2.2x8† 2 8GB 4Gbps UBUNTU_18_64 virtual 100GB 0B vpc-gen2
|
||||
bx2.32x128 32 128GB 16Gbps UBUNTU_18_64 virtual 100GB 0B vpc-gen2
|
||||
bx2.48x192 48 192GB 16Gbps UBUNTU_18_64 virtual 100GB 0B vpc-gen2
|
||||
bx2.4x16 4 16GB 8Gbps UBUNTU_18_64 virtual 100GB 0B vpc-gen2
|
||||
...
|
||||
```
|
||||
|
||||
The recommended configuration for a cluster is at least 8 vCPU cores with 16GB memory. Hence, we recommend
|
||||
`bx2.4x16` flavor to create a two-worker-node cluster. Keep in mind that you can always scale the cluster
|
||||
by adding more worker nodes should your application scales up.
|
||||
|
||||
Now set the environment variable with the flavor you choose.
|
||||
|
||||
```shell
|
||||
export WORKER_NODE_FLAVOR=bx2.4x16
|
||||
```
|
||||
|
||||
## Create an IBM Cloud Kubernetes cluster for `vpc-gen2` infrastructure
|
||||
|
||||
Creating a `vpc-gen2` based cluster needs a VPC, a subnet and a public gateway attached to it. Fortunately, this is a one
|
||||
time setup. Future `vpc-gen2` clusters can reuse the same VPC/subnet(with attached public-gateway).
|
||||
|
||||
1. Begin with installing a `vpc-infrastructure` plugin:
|
||||
|
||||
```shell
|
||||
ibmcloud plugin install vpc-infrastructure
|
||||
```
|
||||
|
||||
Refer to this [link](https://cloud.ibm.com/docs/containers?topic=containers-vpc_ks_tutorial), for more information.
|
||||
|
||||
2. Target `vpc-gen 2` to access gen 2 VPC resources:
|
||||
|
||||
```shell
|
||||
ibmcloud is target --gen 2
|
||||
```
|
||||
|
||||
Verify that the target is correctly set up:
|
||||
|
||||
```shell
|
||||
ibmcloud is target
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
Target Generation: 2
|
||||
```
|
||||
|
||||
3. Create or use an existing VPC:
|
||||
|
||||
a) Use an existing VPC:
|
||||
|
||||
```shell
|
||||
ibmcloud is vpcs
|
||||
```
|
||||
|
||||
|
||||
Example output:
|
||||
```
|
||||
Listing vpcs for generation 2 compute in all resource groups and region ...
|
||||
ID Name Status Classic access Default network ACL Default security group Resource group
|
||||
r006-hidden-68cc-4d40-xxxx-4319fa3gxxxx my-vpc1 available false husker-sloping-bee-resize blimp-hasty-unaware-overflow kubeflow
|
||||
```
|
||||
|
||||
If the above list contains the VPC that can be used to deploy your cluster - make a note of its ID.
|
||||
|
||||
b) To create a new VPC, proceed as follows:
|
||||
|
||||
```shell
|
||||
ibmcloud is vpc-create my-vpc
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
Creating vpc my-vpc in resource group kubeflow under account IBM as ...
|
||||
|
||||
ID r006-hidden-68cc-4d40-xxxx-4319fa3fxxxx
|
||||
Name my-vpc
|
||||
...
|
||||
```
|
||||
|
||||
**Save the ID in a variable `VPC_ID` as follows, so that we can use it later.**
|
||||
|
||||
```shell
|
||||
export VPC_ID=r006-hidden-68cc-4d40-xxxx-4319fa3fxxxx
|
||||
```
|
||||
|
||||
4. Create or use an existing subnet:
|
||||
|
||||
a) To use an existing subnet:
|
||||
|
||||
```shell
|
||||
ibmcloud is subnets
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
Listing subnets for generation 2 compute in all resource groups and region ...
|
||||
ID Name Status Subnet CIDR Addresses ACL Public Gateway VPC Zone Resource group
|
||||
0737-27299d09-1d95-4a9d-a491-a6949axxxxxx my-subnet available 10.240.128.0/18 16373/16384 husker-sloping-bee-resize my-gateway my-vpc us-south-3 kubeflow
|
||||
```
|
||||
|
||||
If the above list contains the subnet corresponding to your VPC, that can be used to deploy your cluster - make sure
|
||||
you note it's ID.
|
||||
|
||||
b) To create a new subnet:
|
||||
- List address prefixes and note the CIDR block corresponding to a Zone;
|
||||
in the below example, for Zone: `us-south-3` the CIDR block is : `10.240.128.0/18`.
|
||||
|
||||
```shell
|
||||
ibmcloud is vpc-address-prefixes $VPC_ID
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
Listing address prefixes of vpc r006-hidden-68cc-4d40-xxxx-4319fa3fxxxx under account IBM as user new@user-email.com...
|
||||
ID Name CIDR block Zone Has subnets Is default Created
|
||||
r006-xxxxxxxx-4002-46d2-8a4f-f69e7ba3xxxx rising-rectified-much-brew 10.240.0.0/18 us-south-1 false true 2021-03-05T14:58:39+05:30
|
||||
r006-xxxxxxxx-dca9-4321-bb6c-960c4424xxxx retrial-reversal-pelican-cavalier 10.240.64.0/18 us-south-2 false true 2021-03-05T14:58:39+05:30
|
||||
r006-xxxxxxxx-7352-4a46-bfb1-fcbac6cbxxxx subfloor-certainly-herbal-ajar 10.240.128.0/18 us-south-3 false true 2021-03-05T14:58:39+05:30
|
||||
```
|
||||
|
||||
- Now create a subnet as follows:
|
||||
|
||||
```shell
|
||||
ibmcloud is subnet-create my-subnet $VPC_ID $CLUSTER_ZONE --ipv4-cidr-block "10.240.128.0/18"
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
Creating subnet my-subnet in resource group kubeflow under account IBM as user new@user-email.com...
|
||||
|
||||
ID 0737-27299d09-1d95-4a9d-a491-a6949axxxxxx
|
||||
Name my-subnet
|
||||
```
|
||||
|
||||
- Make sure you export the subnet IDs follows:
|
||||
|
||||
```shell
|
||||
export SUBNET_ID=0737-27299d09-1d95-4a9d-a491-a6949axxxxxx
|
||||
```
|
||||
|
||||
5. Create a `vpc-gen2` based Kubernetes cluster:
|
||||
|
||||
```shell
|
||||
ibmcloud ks cluster create vpc-gen2 \
|
||||
--name $CLUSTER_NAME \
|
||||
--zone $CLUSTER_ZONE \
|
||||
--version ${KUBERNETES_VERSION} \
|
||||
--flavor ${WORKER_NODE_FLAVOR} \
|
||||
--vpc-id ${VPC_ID} \
|
||||
--subnet-id ${SUBNET_ID} \
|
||||
--workers 2
|
||||
```
|
||||
|
||||
6. Attach a public gateway
|
||||
|
||||
This step is mandatory for Kubeflow deployment to succeed, because pods need public internet access to download images.
|
||||
|
||||
- First, check if your cluster is already assigned a public gateway:
|
||||
|
||||
```shell
|
||||
ibmcloud is pubgws
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
Listing public gateways for generation 2 compute in all resource groups and region ...
|
||||
ID Name Status Floating IP VPC Zone Resource group
|
||||
r006-xxxxxxxx-5731-4ffe-bc51-1d9e5fxxxxxx my-gateway available xxx.xxx.xxx.xxx my-vpc us-south-3 default
|
||||
|
||||
```
|
||||
|
||||
In the above run, the gateway is already attached for the vpc: `my-vpc`. In case no gateway is attached, proceed with
|
||||
the rest of the setup.
|
||||
|
||||
- Next, attach a public gateway by running the following command:
|
||||
|
||||
```shell
|
||||
ibmcloud is public-gateway-create my-gateway $VPC_ID $CLUSTER_ZONE
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
ID: r006-xxxxxxxx-5731-4ffe-bc51-1d9e5fxxxxxx
|
||||
```
|
||||
|
||||
Save the above generated gateway ID as follows:
|
||||
|
||||
```shell
|
||||
export GATEWAY_ID="r006-xxxxxxxx-5731-4ffe-bc51-1d9e5fxxxxxx"
|
||||
```
|
||||
|
||||
- Finally, attach the public gateway to the subnet:
|
||||
|
||||
```shell
|
||||
ibmcloud is subnet-update $SUBNET_ID --public-gateway-id $GATEWAY_ID
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
Updating subnet 0737-27299d09-1d95-4a9d-a491-a6949axxxxxx under account IBM as user new@user-email.com...
|
||||
|
||||
ID 0737-27299d09-1d95-4a9d-a491-a6949axxxxxx
|
||||
Name my-subnet
|
||||
...
|
||||
```
|
||||
|
||||
### Verifying the cluster
|
||||
|
||||
To use the created cluster, switch the Kubernetes context to point to the cluster:
|
||||
|
||||
```shell
|
||||
ibmcloud ks cluster config --cluster ${CLUSTER_NAME}
|
||||
```
|
||||
|
||||
Make sure all worker nodes are up with the command below:
|
||||
|
||||
```shell
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
and verify that all the nodes are in `Ready` state.
|
||||
|
||||
### Delete the cluster
|
||||
|
||||
Delete the cluster including it's storage:
|
||||
|
||||
```shell
|
||||
ibmcloud ks cluster rm --force-delete-storage -c ${CLUSTER_NAME}
|
||||
```
|
||||
|
|
@ -44,12 +44,23 @@ Get the Kubeconfig file:
|
|||
ibmcloud ks cluster config --cluster $CLUSTER_NAME
|
||||
```
|
||||
|
||||
From here on, please see [Install Kubeflow](/docs/ibm/deploy/install-kubeflow).
|
||||
From here on, go to [Install Kubeflow on IKS](/docs/ibm/deploy/install-kubeflow-on-iks) for more information.
|
||||
|
||||
|
||||
## Create and setup a new cluster
|
||||
|
||||
Follow these steps to create and setup a new [IBM Cloud Kubernetes Service(IKS) cluster:
|
||||
* Use a `classic` provider if you want to try out Kubeflow.
|
||||
* Use a `vpc-gen2` provider if you are familiar with Cloud networking and want to deploy Kubeflow on a secure environment.
|
||||
|
||||
A `classic` provider exposes each cluster node to the public internet and therefore has
|
||||
a relatively simpler networking setup. Services exposed using Kubernetes `NodePort` need to be secured using
|
||||
authentication mechanism.
|
||||
|
||||
To create a cluster with `vpc-gen2` provider, follow the
|
||||
[Create a cluster on IKS with a `vpc-gen2` provider](/docs/ibm/create-cluster-vpc)
|
||||
guide.
|
||||
|
||||
The next section will explain how to create and set up a new IBM Cloud Kubernetes Service (IKS)
|
||||
|
||||
### Setting environment variables
|
||||
|
||||
|
|
@ -62,20 +73,41 @@ export WORKER_NODE_PROVIDER=classic
|
|||
export CLUSTER_NAME=kubeflow
|
||||
```
|
||||
|
||||
- `KUBERNETES_VERSION` specifies the Kubernetes version for the cluster. Run `ibmcloud ks versions` to see the supported Kubernetes versions. If this environment variable is not set, the cluster will be created with the default version set by IBM Cloud Kubernetes Service. Refer to the [Minimum system requirements](https://www.kubeflow.org/docs/started/k8s/overview/#minimum-system-requirements) and choose a Kubernetes version compatible with the Kubeflow release to be deployed.
|
||||
- `CLUSTER_ZONE` identifies the regions or location where CLUSTER_NAME will be created. Run `ibmcloud ks locations` to list supported IBM Cloud Kubernetes Service locations. For example, choose `dal13` to create CLUSTER_NAME in the Dallas (US) data center.
|
||||
- `WORKER_NODE_PROVIDER` specifies the kind of IBM Cloud infrastructure on which the Kubernetes worker nodes will be created. The `classic` type supports worker nodes with GPUs. There are other worker nodes providers including `vpc-classic` and `vpc-gen2` where zone names and worker flavors will be different. Please use `ibmcloud ks zones --provider ${WORKER_NODE_PROVIDER}` to list zone names if using other providers and set the `CLUSTER_ZONE` accordingly.
|
||||
where:
|
||||
|
||||
- `KUBERNETES_VERSION` specifies the Kubernetes version for the cluster. Run `ibmcloud ks versions` to see the supported
|
||||
Kubernetes versions. If this environment variable is not set, the cluster will be created with the default version set
|
||||
by IBM Cloud Kubernetes Service. Refer to
|
||||
[Minimum system requirements](https://www.kubeflow.org/docs/started/k8s/overview/#minimum-system-requirements)
|
||||
and choose a Kubernetes version compatible with the Kubeflow release to be deployed.
|
||||
- `CLUSTER_ZONE` identifies the regions or location where cluster will be created. Run `ibmcloud ks locations` to
|
||||
list supported IBM Cloud Kubernetes Service locations. For example, choose `dal13` to create your cluster in the
|
||||
Dallas (US) data center.
|
||||
- `WORKER_NODE_PROVIDER` specifies the kind of IBM Cloud infrastructure on which the Kubernetes worker nodes will be
|
||||
created. The `classic` type supports worker nodes with GPUs. There are other worker nodes providers including
|
||||
`vpc-classic` and `vpc-gen2` where zone names and worker flavors will be different. Run
|
||||
`ibmcloud ks zones --provider classic` to list zone names for `classic` provider and set the `CLUSTER_ZONE`
|
||||
accordingly.
|
||||
- `CLUSTER_NAME` must be lowercase and unique among any other Kubernetes
|
||||
clusters in the specified `${CLUSTER_ZONE}`.
|
||||
|
||||
**Notice**: If choosing other Kubernetes worker nodes providers than `classic`, refer to the IBM Cloud official document [Creating clusters](https://cloud.ibm.com/docs/containers?topic=containers-clusters) for detailed steps.
|
||||
**Notice**: Refer to [Creating clusters](https://cloud.ibm.com/docs/containers?topic=containers-clusters) in the IBM
|
||||
Cloud documentation for additional information on how to set up other providers and zones in your cluster.
|
||||
|
||||
### Choosing a worker node flavor
|
||||
|
||||
The worker nodes flavor name varies from zones and providers. Run `ibmcloud ks flavors --zone ${CLUSTER_ZONE} --provider ${WORKER_NODE_PROVIDER}` to list available flavors. For example, following are some flavors supported in the `dal13` zone with `classic` worker node provider.
|
||||
The worker node flavor name varies from zones and providers. Run
|
||||
`ibmcloud ks flavors --zone ${CLUSTER_ZONE} --provider ${WORKER_NODE_PROVIDER}` to list available flavors.
|
||||
|
||||
```text
|
||||
$ ibmcloud ks flavors --zone dal13 --provider classic
|
||||
For example, the following are some worker node flavors supported in the `dal13` zone with a `classic` node provider.
|
||||
|
||||
```shell
|
||||
ibmcloud ks flavors --zone dal13 --provider classic
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
OK
|
||||
For more information about these flavors, see 'https://ibm.biz/flavors'
|
||||
Name Cores Memory Network Speed OS Server Type Storage Secondary Storage Provider
|
||||
|
|
@ -92,15 +124,18 @@ b3c.8x32 8 32GB 1000Mbps UBUNTU_18_64 virtua
|
|||
...
|
||||
```
|
||||
|
||||
Choose a flavor that will work for your applications. For the purpose of the Kubeflow deployment, the recommended configuration for a cluster is at least 8 vCPU cores with 16GB memory. Hence you can either choose the `b3c.8x32` flavor to create a one-worker-node cluster or choose the `b3c.4x16` flavor to create a two-worker-node cluster. Keep in mind that you can always scale the cluster by adding more worker nodes should your application scales up.
|
||||
Choose a flavor that will work for your applications. For the purpose of the Kubeflow deployment, the recommended
|
||||
configuration for a cluster is at least 8 vCPU cores with 16GB memory. Hence you can either choose the `b3c.8x32` flavor
|
||||
to create a one-worker-node cluster or choose the `b3c.4x16` flavor to create a two-worker-node cluster. Keep in mind
|
||||
that you can always scale the cluster by adding more worker nodes should your application scales up.
|
||||
|
||||
Now set the environment variable with the flavor you choose.
|
||||
Now, set the environment variable with the worker node flavor of your choice:
|
||||
|
||||
```shell
|
||||
export WORKER_NODE_FLAVOR=b3c.4x16
|
||||
```
|
||||
|
||||
### Creating a IBM Cloud Kubernetes cluster
|
||||
### Creating an IBM Cloud Kubernetes cluster
|
||||
|
||||
Run with the following command to create a cluster:
|
||||
|
||||
|
|
@ -115,7 +150,11 @@ ibmcloud ks cluster create ${WORKER_NODE_PROVIDER} \
|
|||
|
||||
Replace the `workers` parameter above with the desired number of worker nodes.
|
||||
|
||||
Note: If you're starting in a fresh account with no public and private VLANs, they are created automatically for you when creating a Kubernetes cluster with worker nodes provider `classic` for the first time. If you already have VLANs configured in your account, retrieve them via `ibmcloud ks vlans --zone ${CLUSTER_ZONE}` and include the public and private VLAN ids (set in the `PUBLIC_VLAN_ID` and `PRIVATE_VLAN_ID` environment variables) in the command, for example:
|
||||
|
||||
**Note**: If you're starting in a fresh account with no public and private VLANs, they are created automatically for you
|
||||
when creating a Kubernetes cluster with worker nodes provider `classic` for the first time. If you already have VLANs
|
||||
configured in your account, retrieve them via `ibmcloud ks vlans --zone ${CLUSTER_ZONE}` and include the public and
|
||||
private VLAN ids (set in the `PUBLIC_VLAN_ID` and `PRIVATE_VLAN_ID` environment variables) in the command, for example:
|
||||
|
||||
```shell
|
||||
ibmcloud ks cluster create ${WORKER_NODE_PROVIDER} \
|
||||
|
|
@ -128,10 +167,11 @@ ibmcloud ks cluster create ${WORKER_NODE_PROVIDER} \
|
|||
--public-vlan ${PUBLIC_VLAN_ID}
|
||||
```
|
||||
|
||||
Wait until the cluster is deployed and configured. It can take a while for the cluster to be ready. Run with following command to periodically check the state of your cluster. Your cluster is ready when the state is `normal`.
|
||||
Wait until the cluster is deployed and configured. It can take a while for the cluster to be ready. Run with following
|
||||
command to periodically check the state of your cluster. Your cluster is ready when the state is `normal`.
|
||||
|
||||
```shell
|
||||
ibmcloud ks clusters --provider ${WORKER_NODE_PROVIDER} |grep ${CLUSTER_NAME}|awk '{print "Name:"$1"\tState:"$3}'
|
||||
ibmcloud ks clusters --provider ${WORKER_NODE_PROVIDER} |grep ${CLUSTER_NAME} |awk '{print "Name:"$1"\tState:"$3}'
|
||||
```
|
||||
|
||||
### Verifying the cluster
|
||||
|
|
@ -149,3 +189,11 @@ kubectl get nodes
|
|||
```
|
||||
|
||||
and make sure all the nodes are in `Ready` state.
|
||||
|
||||
### Delete the cluster
|
||||
|
||||
Delete the cluster including it's storage:
|
||||
|
||||
```shell
|
||||
ibmcloud ks cluster rm --force-delete-storage -c ${CLUSTER_NAME}
|
||||
```
|
||||