chore(deps): bump actions/setup-java from 4 to 5 (#1366 )

Bumps [actions/setup-java](https://github.com/actions/setup-java) from 4 to 5. - [Release notes](https://github.com/actions/setup-java/releases) - [Commits](https://github.com/actions/setup-java/compare/v4...v5) --- updated-dependencies: - dependency-name: actions/setup-java dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
chore(deps): bump actions/checkout from 4 to 5 (#1359 )
2025-08-26 02:37:19 +00:00 · 2025-08-14 03:36:12 +00:00 · 2025-08-14 03:35:12 +00:00 · 2025-07-23 05:34:59 +00:00 · 2025-07-22 02:37:58 +00:00 · 2025-07-11 14:56:52 +00:00
7414 changed files with 2288343 additions and 13526 deletions
--- a/.github/workflows/check-release.yaml
+++ b/.github/workflows/check-release.yaml
@ -3,7 +3,7 @@ name: Check Release
 on:
  pull_request:
    branches:
-      - release-*
+      - master
    paths:
      - VERSION

@ -21,7 +21,7 @@ jobs:

    steps:
      - name: Checkout source code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5
        with:
          fetch-depth: 0

--- a/.github/workflows/integration.yaml
+++ b/.github/workflows/integration.yaml
@ -20,7 +20,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout source code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5

      - name: Set up Go
        uses: actions/setup-go@v5
@ -35,11 +35,19 @@ jobs:
            exit 1
          fi

+      - name: Run go mod vendor
+        run: |
+          go mod vendor
+          if ! git diff --quiet; then
+            echo "Please run 'go mod vendor' to make vendored copy of dependencies"
+            exit 1
+          fi
+
      - name: Run go fmt check
        run: |
          make go-fmt
          if ! git diff --quiet; then
-            echo "Please run 'make go-fmt' to run go fmt aganist code"
+            echo "Please run 'make go-fmt' to run go fmt against code"
            exit 1
          fi

@ -47,7 +55,7 @@ jobs:
        run: |
          make go-vet
          if ! git diff --quiet; then
-            echo "Please run 'make go-vet' to run go vet aganist code"
+            echo "Please run 'make go-vet' to run go vet against code"
            exit 1
          fi

@ -55,10 +63,14 @@ jobs:
        run: |
          make go-lint

-      - name: Run unit tests
+      - name: Run Go unit tests
        run: |
          make unit-test

+      - name: Run Helm unit tests
+        run: |
+          make helm-unittest
+
      - name: Build arena binary
        run: |
          make arena
@ -67,9 +79,9 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout source code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5

-      - uses: actions/setup-java@v4
+      - uses: actions/setup-java@v5
        with:
          distribution: zulu
          java-version: 8
@ -83,7 +95,7 @@ jobs:

    steps:
      - name: Checkout source code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5

      - uses: actions/setup-python@v5
        with:
@ -93,3 +105,33 @@ jobs:
        run: |
          pip install -r docs/requirements.txt
          mkdocs build --strict
+
+  e2e-test:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout source code
+        uses: actions/checkout@v5
+        with:
+          fetch-depth: 0
+
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version-file: go.mod
+
+      - name: Set up Kind cluster
+        uses: helm/kind-action@v1
+        with:
+          node_image: kindest/node:v1.29.10
+          config: arena-artifacts/ci/kind-config.yaml
+
+      - name: Install arena client
+        run: |
+          make arena-installer
+          tar -zxf arena-installer-*.tar.gz
+          arena-installer-*/install.sh --only-binary
+
+      - name: Run e2e tests
+        run: |
+          make e2e-test
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@ -3,10 +3,14 @@ name: Release
 on:
  push:
    branches:
-      - release-*
+      - master
    paths:
      - VERSION

+env:
+  IMAGE_REGISTRY: ghcr.io
+  IMAGE_REPOSITORY: ${{ github.repository }}
+
 concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
@ -26,7 +30,7 @@ jobs:
          - arm64
    steps:
      - name: Checkout
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5

      - name: Read version from VERSION file
        run: |
@ -49,15 +53,135 @@ jobs:
          if-no-files-found: error
          overwrite: true

-  push_tag:
+  build-arena-image:
+    name: Build Arena container image
+
+    runs-on: ubuntu-latest
+
+    strategy:
+      fail-fast: false
+      matrix:
+        platform:
+          - linux/amd64
+          - linux/arm64
+
+    steps:
+      - name: Prepare
+        run: |
+          platform=${{ matrix.platform }}
+          echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV
+
+      - name: Checkout source code
+        uses: actions/checkout@v5
+
+      - name: Read version from VERSION file
+        run: |
+          VERSION=$(cat VERSION)
+          echo "VERSION=${VERSION}" >> $GITHUB_ENV
+
+      - name: Docker meta
+        id: meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}
+          tags: |
+            type=semver,pattern={{version}},value=${{ env.VERSION }}
+
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+
+      - name: Set up Docker buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Login to container registry
+        uses: docker/login-action@v3
+        with:
+          registry: ${{ env.IMAGE_REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Build and push by digest
+        id: build
+        uses: docker/build-push-action@v6
+        with:
+          platforms: ${{ matrix.platform }}
+          labels: ${{ steps.meta.outputs.labels }}
+          outputs: type=image,name=${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }},push-by-digest=true,name-canonical=true,push=true
+
+      - name: Export digest
+        run: |
+          mkdir -p /tmp/digests
+          digest="${{ steps.build.outputs.digest }}"
+          touch "/tmp/digests/${digest#sha256:}"
+
+      - name: Upload digest
+        uses: actions/upload-artifact@v4
+        with:
+          name: digests-${{ env.PLATFORM_PAIR }}
+          path: /tmp/digests/*
+          if-no-files-found: error
+          retention-days: 1
+
+  release-image:
    needs:
-      - package-arena-installer
+      - build-arena-image

    runs-on: ubuntu-latest

    steps:
      - name: Checkout source code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5
+
+      - name: Read version from VERSION file
+        run: |
+          VERSION=$(cat VERSION)
+          echo "VERSION=${VERSION}" >> $GITHUB_ENV
+
+      - name: Docker meta
+        id: meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}
+          tags: |
+            type=semver,pattern={{version}},value=${{ env.VERSION }}
+
+      - name: Download digests
+        uses: actions/download-artifact@v5
+        with:
+          path: /tmp/digests
+          pattern: digests-*
+          merge-multiple: true
+
+      - name: Set up Docker buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Login to container registry
+        uses: docker/login-action@v3
+        with:
+          registry: ${{ env.IMAGE_REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Create manifest list and push
+        working-directory: /tmp/digests
+        run: |
+          docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \
+            $(printf '${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}@sha256:%s ' *)
+
+      - name: Inspect image
+        run: |
+          docker buildx imagetools inspect ${{ env.IMAGE_REGISTRY }}/${{ env.IMAGE_REPOSITORY }}:${{ steps.meta.outputs.version }}
+
+  push_tag:
+    needs:
+      - package-arena-installer
+      - release-image
+
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout source code
+        uses: actions/checkout@v5
        with:
          fetch-depth: 0

@ -77,7 +201,7 @@ jobs:
          git tag -a ${TAG} -m "Release v${VERSION}"
          git push origin ${TAG}

-  draft_relase:
+  draft_release:
    needs:
      - push_tag

@ -88,7 +212,7 @@ jobs:

    steps:
      - name: Checkout
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5

      - name: Configure Git
        run: |
@ -101,7 +225,7 @@ jobs:
          echo "VERSION=${VERSION}" >> ${GITHUB_ENV}

      - name: Download arena installer tarballs
-        uses: actions/download-artifact@v4
+        uses: actions/download-artifact@v5
        with:
          pattern: arena-installer-${{ env.VERSION }}-{linux,darwin}-{amd64,arm64}

--- a/.github/workflows/stale.yaml
+++ b/.github/workflows/stale.yaml
@ -0,0 +1,43 @@
+# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
+#
+# You can adjust the behavior by modifying this file.
+# For more information, see:
+# https://github.com/actions/stale
+
+name: Mark stale issues and pull requests
+
+on:
+  schedule:
+    - cron: "0 0 * * 0"
+
+jobs:
+  stale:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      pull-requests: write
+
+    steps:
+      - uses: actions/stale@v9
+        with:
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          days-before-stale: 360
+          days-before-close: 180
+          stale-issue-message: >
+            This issue has been automatically marked as stale because it has not had
+            recent activity. It will be closed if no further activity occurs. Thank you
+            for your contributions.
+          close-issue-message: >
+            This issue has been automatically closed because it has not had recent
+            activity. Please comment "/reopen" to reopen it.
+          stale-issue-label: lifecycle/stale
+          exempt-issue-labels: lifecycle/frozen
+          stale-pr-message: >
+            This pull request has been automatically marked as stale because it has not had
+            recent activity. It will be closed if no further activity occurs. Thank you
+            for your contributions.
+          close-pr-message: >
+            This pull request has been automatically closed because it has not had recent
+            activity. Please comment "/reopen" to reopen it.
+          stale-pr-label: lifecycle/stale
+          exempt-pr-labels: lifecycle/frozen
--- a/.golangci.yaml
+++ b/.golangci.yaml
@ -0,0 +1,76 @@
+version: "2"
+
+run:
+  # Timeout for total work, e.g. 30s, 5m, 5m30s.
+  # If the value is lower or equal to 0, the timeout is disabled.
+  # Default: 0 (disabled)
+  timeout: 2m
+
+linters:
+  # Enable specific linters.
+  # https://golangci-lint.run/usage/linters/#enabled-by-default
+  enable:
+    # Detects places where loop variables are copied.
+    - copyloopvar
+    # Checks for duplicate words in the source code.
+    - dupword
+    # Tool for detection of FIXME, TODO and other comment keywords.
+    # - godox
+    # Enforces consistent import aliases.
+    - importas
+    # Find code that shadows one of Go's predeclared identifiers.
+    - predeclared
+    # Check that struct tags are well aligned.
+    - tagalign
+    # Remove unnecessary type conversions.
+    - unconvert
+    # Checks Go code for unused constants, variables, functions and types.
+    - unused
+  # Disable specific linters.
+  disable:
+    # Errcheck is a program for checking for unchecked errors in Go code.
+    - errcheck
+
+  settings:
+    importas:
+      # List of aliases
+      alias:
+        - pkg: k8s.io/api/admissionregistration/v1
+          alias: admissionregistrationv1
+        - pkg: k8s.io/api/apps/v1
+          alias: appsv1
+        - pkg: k8s.io/api/batch/v1
+          alias: batchv1
+        - pkg: k8s.io/api/core/v1
+          alias: corev1
+        - pkg: k8s.io/api/extensions/v1beta1
+          alias: extensionsv1beta1
+        - pkg: k8s.io/api/networking/v1
+          alias: networkingv1
+        - pkg: k8s.io/apimachinery/pkg/apis/meta/v1
+          alias: metav1
+        - pkg: sigs.k8s.io/controller-runtime
+          alias: ctrl
+
+  exclusions:
+    # Which file paths to exclude: they will be analyzed, but issues from them won't be reported.
+    # "/" will be replaced by the current OS file path separator to properly work on Windows.
+    # Default: []
+    paths:
+      - pkg/operators
+
+issues:
+  # Maximum issues count per one linter.
+  # Set to 0 to disable.
+  # Default: 50
+  max-issues-per-linter: 50
+  # Maximum count of issues with the same text.
+  # Set to 0 to disable.
+  # Default: 3
+  max-same-issues: 10
+
+formatters:
+  enable:
+    # Check import statements are formatted according to the 'goimport' command.
+    - goimports
+
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,177 @@
 # Changelog

+## [v0.15.1](https://github.com/kubeflow/arena/tree/v0.15.1) (2025-06-25)
+
+### Features
+
+- Add support for configuring tolerations ([#1337](https://github.com/kubeflow/arena/pull/1337) by [@ChenYi015](https://github.com/ChenYi015))
+
+### Misc
+
+- Remove kubernetes artifacts ([#1329](https://github.com/kubeflow/arena/pull/1329) by [@ChenYi015](https://github.com/ChenYi015))
+- [CI] Add CI workflow for releasing Arena images ([#1340](https://github.com/kubeflow/arena/pull/1340) by [@ChenYi015](https://github.com/ChenYi015))
+- Update uninstall bash script ([#1335](https://github.com/kubeflow/arena/pull/1335) by [@ChenYi015](https://github.com/ChenYi015))
+- Fix golangci-lint issues ([#1341](https://github.com/kubeflow/arena/pull/1341) by [@ChenYi015](https://github.com/ChenYi015))
+- Bump golang version from 1.22.7 to 1.23.10 ([#1345](https://github.com/kubeflow/arena/pull/1345) by [@ChenYi015](https://github.com/ChenYi015))
+- chore(deps): bump github.com/prometheus/common from 0.60.1 to 0.65.0 ([#1343](https://github.com/kubeflow/arena/pull/1343) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- chore(deps): bump golang.org/x/crypto from 0.38.0 to 0.39.0 ([#1334](https://github.com/kubeflow/arena/pull/1334) by [@dependabot[bot]](https://github.com/apps/dependabot))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/v0.15.0...v0.15.1)
+
+## [v0.15.0](https://github.com/kubeflow/arena/tree/v0.15.0) (2025-06-04)
+
+### Features
+
+- refactor: use helm lib instead of helm binary ([#1207](https://github.com/kubeflow/arena/pull/1207) by [@ChenYi015](https://github.com/ChenYi015))
+- feat: add new value for using localtime in cron-operator ([#1296](https://github.com/kubeflow/arena/pull/1296) by [@ChenYi015](https://github.com/ChenYi015))
+- Delete all services when the TFJob is terminated ([#1316](https://github.com/kubeflow/arena/pull/1316) by [@ChenYi015](https://github.com/ChenYi015))
+- Make number of replicas of cron-operator deployment configurable ([#1325](https://github.com/kubeflow/arena/pull/1325) by [@ChenYi015](https://github.com/ChenYi015))
+- Make number of replicas of tf-operator deployment configurable ([#1323](https://github.com/kubeflow/arena/pull/1323) by [@ChenYi015](https://github.com/ChenYi015))
+- Add custom device support for kserve and kserving. ([#1315](https://github.com/kubeflow/arena/pull/1315) by [@Leoyzen](https://github.com/Leoyzen))
+- Feat: support affinity policy for kserve and tfjob ([#1319](https://github.com/kubeflow/arena/pull/1319) by [@Syspretor](https://github.com/Syspretor))
+- Feat: support separate affinity policy configuration for PS and worke… ([#1331](https://github.com/kubeflow/arena/pull/1331) by [@Syspretor](https://github.com/Syspretor))
+
+### Bug Fixes
+
+- fix: job status displays incorrectly ([#1289](https://github.com/kubeflow/arena/pull/1289) by [@ChenYi015](https://github.com/ChenYi015))
+- fix: service account should use release namespace ([#1308](https://github.com/kubeflow/arena/pull/1308) by [@ChenYi015](https://github.com/ChenYi015))
+
+### Misc
+
+- Add basic e2e tests ([#1225](https://github.com/kubeflow/arena/pull/1225) by [@ChenYi015](https://github.com/ChenYi015))
+- Bump github.com/containerd/containerd from 1.7.23 to 1.7.27 ([#1290](https://github.com/kubeflow/arena/pull/1290) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Add stale bot to mark stale issues and PRs ([#1141](https://github.com/kubeflow/arena/pull/1141) by [@ChenYi015](https://github.com/ChenYi015))
+- Fix typos in multiple files ([#1304](https://github.com/kubeflow/arena/pull/1304) by [@co63oc](https://github.com/co63oc))
+- Fix typos in multiple files ([#1310](https://github.com/kubeflow/arena/pull/1310) by [@co63oc](https://github.com/co63oc))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/v0.14.2...v0.15.0)
+
+## [v0.14.2](https://github.com/kubeflow/arena/tree/v0.14.2) (2025-03-10)
+
+### Misc
+
+- Fix typos ([#1276](https://github.com/kubeflow/arena/pull/1276) by [@co63oc](https://github.com/co63oc))
+- Update pytorch operator image ([#1281](https://github.com/kubeflow/arena/pull/1281) by [@ChenYi015](https://github.com/ChenYi015))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/v0.14.1...v0.14.2)
+
+## [v0.14.1](https://github.com/kubeflow/arena/tree/v0.14.1) (2025-02-24)
+
+### Bug Fixes
+
+- fix: device value does not support k8s resource quantity ([#1267](https://github.com/kubeflow/arena/pull/1267) by [@ChenYi015](https://github.com/ChenYi015))
+- fix: pytorchjob does not support backoff limit ([#1272](https://github.com/kubeflow/arena/pull/1272) by [@ChenYi015](https://github.com/ChenYi015))
+- unset env NVIDIA_VISIBLE_DEVICES when gpushare is enabled ([#1273](https://github.com/kubeflow/arena/pull/1273) by [@ChenYi015](https://github.com/ChenYi015))
+
+### Misc
+
+- docs: fixed typo ([#1257](https://github.com/kubeflow/arena/pull/1257) by [@DBMxrco](https://github.com/DBMxrco))
+- Bump github.com/golang/glog from 1.2.3 to 1.2.4 ([#1263](https://github.com/kubeflow/arena/pull/1263) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- fix: format of tensorflow standalone training docs is messed up ([#1265](https://github.com/kubeflow/arena/pull/1265) by [@ChenYi015](https://github.com/ChenYi015))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/v0.14.0...v0.14.1)
+
+## [v0.14.0](https://github.com/kubeflow/arena/tree/v0.14.0) (2025-02-12)
+
+### Features
+
+- rename parameter ([#1262](https://github.com/kubeflow/arena/pull/1262) by [@gujingit](https://github.com/gujingit))
+
+### Misc
+
+- Add changelog for v0.13.1 ([#1248](https://github.com/kubeflow/arena/pull/1248) by [@ChenYi015](https://github.com/ChenYi015))
+- Bump github.com/go-resty/resty/v2 from 2.16.0 to 2.16.5 ([#1254](https://github.com/kubeflow/arena/pull/1254) by [@dependabot[bot]](https://github.com/apps/dependabot))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/v0.13.1...v0.14.0)
+
+## [v0.13.1](https://github.com/kubeflow/arena/tree/v0.13.1) (2025-01-13)
+
+### Misc
+
+- feat: add linux/arm64 support for tf-operator image ([#1238](https://github.com/kubeflow/arena/pull/1238) by [@ChenYi015](https://github.com/ChenYi015))
+- feat: add linux/arm64 support for mpi-operator image ([#1239](https://github.com/kubeflow/arena/pull/1239) by [@ChenYi015](https://github.com/ChenYi015))
+- feat: add linux/arm64 support for cron-operator image ([#1240](https://github.com/kubeflow/arena/pull/1240) by [@ChenYi015](https://github.com/ChenYi015))
+- feat: add linux/arm64 support for et-operator image ([#1241](https://github.com/kubeflow/arena/pull/1241) by [@ChenYi015](https://github.com/ChenYi015))
+- Add PyTorch mnist example ([#1237](https://github.com/kubeflow/arena/pull/1237) by [@ChenYi015](https://github.com/ChenYi015))
+- Update the version of elastic-job-supervisor in arena-artifacts ([#1247](https://github.com/kubeflow/arena/pull/1247) by [@AlanFokCo](https://github.com/AlanFokCo))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/v0.13.0...v0.13.1)
+
+## [v0.13.0](https://github.com/kubeflow/arena/tree/v0.13.0) (2024-12-23)
+
+### New Features
+
+- feat: add support for torchrun ([#1228](https://github.com/kubeflow/arena/pull/1228) by [@ChenYi015](https://github.com/ChenYi015))
+- Update pytorch-operator image ([#1234](https://github.com/kubeflow/arena/pull/1234) by [@ChenYi015](https://github.com/ChenYi015))
+
+### Bug Fix
+
+- Avoid listing jobs and statefulsets when get pytorchjob ([#1229](https://github.com/kubeflow/arena/pull/1229) by [@ChenYi015](https://github.com/ChenYi015))
+
+### Misc
+
+- Update tfjob standalone training job doc ([#1222](https://github.com/kubeflow/arena/pull/1222) by [@ChenYi015](https://github.com/ChenYi015))
+- Remove archived docs ([#1208](https://github.com/kubeflow/arena/pull/1208) by [@ChenYi015](https://github.com/ChenYi015))
+- Add changelog for v0.12.1 ([#1224](https://github.com/kubeflow/arena/pull/1224) by [@ChenYi015](https://github.com/ChenYi015))
+- Bump golang.org/x/crypto from 0.29.0 to 0.31.0 ([#1231](https://github.com/kubeflow/arena/pull/1231) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Bump google.golang.org/protobuf from 1.35.1 to 1.36.0 ([#1227](https://github.com/kubeflow/arena/pull/1227) by [@dependabot[bot]](https://github.com/apps/dependabot))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/v0.12.1...v0.13.0)
+
+## [v0.12.1](https://github.com/kubeflow/arena/tree/v0.12.1) (2024-11-25)
+
+### New Features
+
+- Support MPI Job with generic devices ([#1209](https://github.com/kubeflow/arena/pull/1209) by [@cheyang](https://github.com/cheyang))
+
+### Bug Fix
+
+- Update tf-operator image to fix clean pod policy issues ([#1200](https://github.com/kubeflow/arena/pull/1200) by [@ChenYi015](https://github.com/ChenYi015))
+- Fix etjob rendering error when using local logging dir ([#1203](https://github.com/kubeflow/arena/pull/1203) by [@TrafalgarZZZ](https://github.com/TrafalgarZZZ))
+- Fix the functionality of generating kubeconfig (#1204) ([#1205](https://github.com/kubeflow/arena/pull/1205) by [@wqlparallel](https://github.com/wqlparallel))
+- Update cron operator image ([#1214](https://github.com/kubeflow/arena/pull/1214) by [@ChenYi015](https://github.com/ChenYi015))
+
+### Misc
+
+- Add changelog for v0.12.0 ([#1199](https://github.com/kubeflow/arena/pull/1199) by [@ChenYi015](https://github.com/ChenYi015))
+- Add go mod vendor check to integration test ([#1198](https://github.com/kubeflow/arena/pull/1198) by [@ChenYi015](https://github.com/ChenYi015))
+- bump github.com/go-resty/resty/v2 from 2.15.3 to 2.16.0 ([#1202](https://github.com/kubeflow/arena/pull/1202) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Publish releases only on master branch ([#1210](https://github.com/kubeflow/arena/pull/1210) by [@ChenYi015](https://github.com/ChenYi015))
+- Add docs for releasing arena ([#1201](https://github.com/kubeflow/arena/pull/1201) by [@ChenYi015](https://github.com/ChenYi015))
+- Bump golang.org/x/crypto from 0.28.0 to 0.29.0 ([#1206](https://github.com/kubeflow/arena/pull/1206) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Release v0.12.1 ([#1215](https://github.com/kubeflow/arena/pull/1215) by [@ChenYi015](https://github.com/ChenYi015))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/29b2d6d2...v0.12.1)
+
+## [v0.12.0](https://github.com/kubeflow/arena/tree/v0.12.0) (2024-11-11)
+
+### New Features
+
+- Feat: add support for distributed serving type ([#1187](https://github.com/kubeflow/arena/pull/1187) by [@linnlh](https://github.com/linnlh))
+- Support distributed serving with vendor update ([#1194](https://github.com/kubeflow/arena/pull/1194) by [@cheyang](https://github.com/cheyang))
+
+### Misc
+
+- Bump github.com/golang/glog from 1.2.2 to 1.2.3 ([#1189](https://github.com/kubeflow/arena/pull/1189) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Bump github.com/prometheus/common from 0.60.0 to 0.60.1 ([#1182](https://github.com/kubeflow/arena/pull/1182) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Bump mkdocs-material from 9.5.42 to 9.5.44 ([#1190](https://github.com/kubeflow/arena/pull/1190) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Release v0.12.0 ([#1197](https://github.com/kubeflow/arena/pull/1197) by [@ChenYi015](https://github.com/ChenYi015))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/46a795e3...v0.12.0)
+
+## [v0.11.0](https://github.com/kubeflow/arena/tree/v0.11.0) (2024-10-24)
+
+### New Features
+
+- Support ray job  ([#1123](https://github.com/kubeflow/arena/pull/1123) by [@qile123](https://github.com/qile123))
+
+### Misc
+
+- Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 ([#1176](https://github.com/kubeflow/arena/pull/1176) by [@dependabot[bot]](https://github.com/apps/dependabot))
+- Bump mkdocs-material from 9.5.40 to 9.5.42 ([#1179](https://github.com/kubeflow/arena/pull/1179) by [@dependabot[bot]](https://github.com/apps/dependabot))
+
+[Full Changelog](https://github.com/kubeflow/arena/compare/e15cb18...v0.11.0)
+
 ## [v0.10.1](https://github.com/kubeflow/arena/tree/v0.10.1) (2024-10-14)

 ### Bug Fixes
--- a/2
+++ b/2
@ -1,6 +1,6 @@
 ARG BASE_IMAGE=debian:12-slim

-FROM golang:1.22.7 as builder
+FROM golang:1.24.0 AS builder

 ARG TARGETOS

--- a/Dockerfile.notebook.cpu
+++ b/Dockerfile.notebook.cpu
@ -3,7 +3,7 @@ ARG BASE_IMAGE=tensorflow/tensorflow:1.12.0-devel-py3

 ARG USER=root

-FROM golang:1.22.7 as build
+FROM golang:1.23.10 AS build

 RUN mkdir -p /go/src/github.com/kubeflow/arena

--- a/Dockerfile.notebook.kubeflow
+++ b/Dockerfile.notebook.kubeflow
@ -2,7 +2,7 @@ ARG BASE_IMAGE=registry.aliyuncs.com/kubeflow-images-public/tensorflow-1.12.0-no

 ARG USER=jovyan

-FROM golang:1.22.7 as build
+FROM golang:1.23.10 AS build

 RUN mkdir -p /go/src/github.com/kubeflow/arena

--- a/64
+++ b/64
@ -18,8 +18,8 @@ DIST_DIR ?= $(CURRENT_DIR)/bin
 ARENA_CLI_NAME ?= arena
 JOB_MONITOR ?= jobmon
 ARENA_UNINSTALL ?= arena-uninstall
-OS ?= linux
-ARCH ?= amd64
+OS ?= $(shell go env GOOS)
+ARCH ?= $(shell go env GOARCH)

 VERSION ?= $(shell cat VERSION)
 BUILD_DATE := $(shell date -u +'%Y-%m-%dT%H:%M:%SZ')
@ -34,17 +34,26 @@ PACKR_CMD := $(shell if [ "`which packr`" ]; then echo "packr"; else echo "go ru
 LOCALBIN ?= $(CURRENT_DIR)/bin
 # Location to put temp files
 TEMPDIR ?= $(CURRENT_DIR)/tmp
+# ARENA_ARTIFACTS
+ARENA_ARTIFACTS_CHART_PATH ?= $(CURRENT_DIR)/arena-artifacts

 # Versions
 GOLANG_VERSION=$(shell grep -e '^go ' go.mod | cut -d ' ' -f 2)
-KUBECTL_VERSION ?= 1.28.4
-HELM_VERSION ?= 3.13.3
-GOLANGCI_LINT_VERSION ?= 1.57.2
+KUBECTL_VERSION ?= v1.28.4
+HELM_VERSION ?= $(shell grep -e 'helm.sh/helm/v3 ' go.mod | cut -d ' ' -f 2)
+HELM_UNITTEST_VERSION ?= 0.5.1
+KIND_VERSION ?= v0.23.0
+KIND_K8S_VERSION ?= v1.29.3
+ENVTEST_VERSION ?= release-0.18
+ENVTEST_K8S_VERSION ?= 1.29.3
+GOLANGCI_LINT_VERSION ?= v2.1.6

 # Binaries
 ARENA ?= arena-v$(VERSION)-$(OS)-$(ARCH)
-KUBECTL ?= kubectl-v$(KUBECTL_VERSION)-$(OS)-$(ARCH)
-HELM ?= helm-v$(HELM_VERSION)-$(OS)-$(ARCH)
+KUBECTL ?= kubectl-$(KUBECTL_VERSION)-$(OS)-$(ARCH)
+HELM ?= helm-$(HELM_VERSION)-$(OS)-$(ARCH)
+KIND ?= $(LOCALBIN)/kind-$(KIND_VERSION)
+ENVTEST ?= $(LOCALBIN)/setup-envtest-$(ENVTEST_VERSION)
 GOLANGCI_LINT ?= golangci-lint-$(GOLANGCI_LINT_VERSION)

 # Tarballs
@ -113,6 +122,9 @@ endif
 help: ## Display this help.
 	@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n  make \033[36m<target>\033[0m\n"} /^[a-zA-Z_0-9-]+:.*?##/ { printf "  \033[36m%-30s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)

+.PHONY: all
+all: go-fmt go-vet go-lint unit-test e2e-test
+
 ##@ Development

 go-fmt: ## Run go fmt against code.
@ -136,7 +148,12 @@ go-lint-fix: golangci-lint ## Run golangci-lint linter and perform fixes.
 .PHONY: unit-test
 unit-test: ## Run go unit tests.
 	@echo "Running go test..."
-	go test ./... -coverprofile cover.out
+	go test $(shell go list ./... | grep -v /e2e) -coverprofile cover.out
+
+.PHONY: e2e-test
+e2e-test: envtest ## Run the e2e tests against a Kind k8s instance that is spun up.
+	@echo "Running e2e tests..."
+	go test ./test/e2e/ -v -ginkgo.v -timeout 30m

 # Build the project
 .PHONY: default
@ -166,8 +183,7 @@ clean: ## Clean up all downloaded and generated files.
 	rm -rf $(LOCALBIN) $(TEMPDIR)

 .PHONY: arena
-arena: $(LOCALBIN)/$(ARENA) ## Build arena CLI for current platform.
-$(LOCALBIN)/$(ARENA): $(LOCALBIN)
+arena: $(LOCALBIN) ## Build arena CLI for current platform.
 	@echo "Building arena CLI..."
 	CGO_ENABLED=0 GOOS=$(OS) GOARCH=$(ARCH) go build -tags netgo -ldflags '${LDFLAGS}' -o $(LOCALBIN)/$(ARENA) cmd/arena/main.go

@ -219,30 +235,41 @@ build-dependabot:
 arena-installer: $(ARENA_INSTALLER_TARBALL) ## Build arena installer tarball
 $(ARENA_INSTALLER_TARBALL): arena kubectl helm
 	echo "Building arena installer tarball..." && \
+	rm -rf $(TEMPDIR)/$(ARENA_INSTALLER) && \
 	mkdir -p $(TEMPDIR)/$(ARENA_INSTALLER)/bin && \
 	cp $(LOCALBIN)/$(ARENA) $(TEMPDIR)/$(ARENA_INSTALLER)/bin/arena && \
 	cp $(LOCALBIN)/$(KUBECTL) $(TEMPDIR)/$(ARENA_INSTALLER)/bin/kubectl && \
 	cp $(LOCALBIN)/$(HELM) $(TEMPDIR)/$(ARENA_INSTALLER)/bin/helm && \
 	cp -R charts $(TEMPDIR)/$(ARENA_INSTALLER) && \
 	cp -R arena-artifacts $(TEMPDIR)/$(ARENA_INSTALLER) && \
-	cp -R kubernetes-artifacts $(TEMPDIR)/$(ARENA_INSTALLER) && \
 	cp arena-gen-kubeconfig.sh $(TEMPDIR)/$(ARENA_INSTALLER)/bin && \
 	cp install.sh $(TEMPDIR)/$(ARENA_INSTALLER) && \
 	cp uninstall.sh $(TEMPDIR)/$(ARENA_INSTALLER)/bin/arena-uninstall && \
 	tar -zcf $(ARENA_INSTALLER).tar.gz -C $(TEMPDIR) $(ARENA_INSTALLER) && \
 	echo "Successfully saved arena installer to $(ARENA_INSTALLER).tar.gz."
 	
+##@ Helm
+
+.PHONY: helm-unittest
+helm-unittest: helm-unittest-plugin ## Run Helm chart unittests.
+	set -x && $(LOCALBIN)/$(HELM) unittest $(ARENA_ARTIFACTS_CHART_PATH) --strict --file "tests/**/*_test.yaml" --chart-tests-path $(CURRENT_DIR)
+	
 ##@ Dependencies

 .PHONY: golangci-lint
 golangci-lint: $(LOCALBIN)/$(GOLANGCI_LINT) ## Download golangci-lint locally if necessary.
 $(LOCALBIN)/$(GOLANGCI_LINT): $(LOCALBIN)
-	$(call go-install-tool,$(LOCALBIN)/$(GOLANGCI_LINT),github.com/golangci/golangci-lint/cmd/golangci-lint,${GOLANGCI_LINT_VERSION})
+	$(call go-install-tool,$(LOCALBIN)/$(GOLANGCI_LINT),github.com/golangci/golangci-lint/v2/cmd/golangci-lint,${GOLANGCI_LINT_VERSION})
+
+.PHONY: envtest
+envtest: $(ENVTEST) ## Download setup-envtest locally if necessary.
+$(ENVTEST): $(LOCALBIN)
+	$(call go-install-tool,$(ENVTEST),sigs.k8s.io/controller-runtime/tools/setup-envtest,$(ENVTEST_VERSION))

 .PHONY: kubectl
 kubectl: $(LOCALBIN)/$(KUBECTL)
 $(LOCALBIN)/$(KUBECTL): $(LOCALBIN) $(TEMPDIR)
-	$(eval KUBECTL_URL=https://dl.k8s.io/release/v$(KUBECTL_VERSION)/bin/$(OS)/$(ARCH)/kubectl)
+	$(eval KUBECTL_URL=https://dl.k8s.io/release/$(KUBECTL_VERSION)/bin/$(OS)/$(ARCH)/kubectl)
 	$(eval KUBECTL_SHA_URL=$(KUBECTL_URL).sha256)

 	cd $(TEMPDIR) && \
@ -278,11 +305,18 @@ $(LOCALBIN)/$(HELM): $(LOCALBIN) $(TEMPDIR)
 	fi && \
 	echo "Verifying checksum..." && \
 	cat $(HELM).tar.gz.sha256sum | shasum -a 256 --check --quiet || (echo "Checksum verification failed, exiting." && false) && \
-	echo "Extrat helm tarball and move it to bin directory..." && \
+	echo "Extract helm tarball and move it to bin directory..." && \
 	tar -zxf $(HELM).tar.gz && \
 	cp ${OS}-${ARCH}/helm $(LOCALBIN)/$(HELM) && \
 	echo "Successfully installed helm to $(LOCALBIN)/$(HELM)."

+.PHONY: helm-unittest-plugin
+helm-unittest-plugin: helm ## Download helm unittest plugin locally if necessary.
+	if [ -z "$(shell $(LOCALBIN)/$(HELM) plugin list | grep unittest)" ]; then \
+		echo "Installing helm unittest plugin"; \
+		$(LOCALBIN)/$(HELM) plugin install https://github.com/helm-unittest/helm-unittest.git --version $(HELM_UNITTEST_VERSION); \
+	fi
+
 # go-install-tool will 'go install' any package with custom target and name of binary, if it doesn't exist
 # $1 - target path with name of binary (ideally with version)
 # $2 - package url which can be installed
@ -290,7 +324,7 @@ $(LOCALBIN)/$(HELM): $(LOCALBIN) $(TEMPDIR)
 define go-install-tool
@[ -f $(1) ] || { \
 set -e; \
-package=$(2)@v$(3) ;\
+package=$(2)@$(3) ;\
 echo "Downloading $${package}" ;\
 GOBIN=$(LOCALBIN) go install $${package} ;\
 mv "$$(echo "$(1)" | sed "s/-$(3)$$//")" $(1) ;\
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 # Arena

-[![Integration Test](https://github.com/kubeflow/arena/actions/workflows/integration.yaml/badge.svg)](https://github.com/kubeflow/arena/actions/workflows/integration.yaml)[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/arena)](https://goreportcard.com/report/github.com/kubeflow/arena)
+[![GitHub release](https://img.shields.io/github/v/release/kubeflow/arena)](https://github.com/kubeflow/arena/releases) [![Integration Test](https://github.com/kubeflow/arena/actions/workflows/integration.yaml/badge.svg)](https://github.com/kubeflow/arena/actions/workflows/integration.yaml) [![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/arena)](https://goreportcard.com/report/github.com/kubeflow/arena)

 View the [Arena documentation](https://arena-docs.readthedocs.io/en/latest).

@ -59,7 +59,7 @@ Then you can analyze the profile by following [Go CPU profiling: pprof and speed

 ## Adopters

-If you are intrested in Arena and would like to share your experiences with others, you are warmly welcome to add your information on [ADOPTERS.md](docs/about/ADOPTERS.md) page. We will continuousely discuss new requirements and feature design with you in advance.
+If you are interested in Arena and would like to share your experiences with others, you are warmly welcome to add your information on [ADOPTERS.md](docs/about/ADOPTERS.md) page. We will continuously discuss new requirements and feature design with you in advance.

 ## FAQ

--- a/ROADMAP.md
+++ b/ROADMAP.md
@ -49,13 +49,13 @@ Objectives: "Simplify the user experience of the data scientists and provide a l
 	* Submit and manage Model Serving with [KF Serving](https://github.com/kubeflow/kfserving)


-Objectives: "Make Arena support the same Operator compatiable with different API version, so the upgrade of operator doesn't impact the existing users' experiences."
+Objectives: "Make Arena support the same Operator compatible with different API version, so the upgrade of operator doesn't impact the existing users' experiences."

 * Compatibility:
 	* v1aphla2 and v1 TFJob
 	* v1alpha1 and v1aphla2 MPIJob

-Objectives: "Enchance the software quality of Arena so it can be in the quick iteration"
+Objectives: "Enhance the software quality of Arena so it can be in the quick iteration"

 * Refactor the source code
 	* Move Training implementation from `cmd` into `pkg`
--- a/2
+++ b/2
@ -1 +1 @@
-0.10.1
+0.15.1
--- a/archived/docs/ADOPTERS.md
+++ b/archived/docs/ADOPTERS.md
@ -1,16 +0,0 @@
-# Adopters Of Arena 
-
-Below are the adopters of project Arena. If you are using Arena to improve efficiency and productivity in Machine Learning with Kubernetes, please feel free to add yourself into the following list by a pull request. There're several phases as follow:
-
-* **Evaluation:** Known Arena, that's interesting; evaluating the features/scopes of Arena
-* **Testing:** Take Arena as one of candidates, testing Kubernetes cluster with Arena
-* **Staging:** Decide to use Arena, testing it in pre-product environment
-* **Production:** Already put Arena into product environment
-
-| Organization | Contact | Phases      | Description of Use |
-| ------------ | ------- | ----------- | ------------------ |
-| [Weibo](https://www.weibo.com) | [@phoenixwu0229](https://github.com/phoenixwu0229) | **Production** |  Weibo ML Platform |
-| [HUYA](https://www.huya.com) | [@BobLiu20](https://github.com/bobliu20) | **Production** |  HUYA AI Platform |
-| [Microsoft](https://www.microsoft.com) | [@chaowangnk1](https://github.com/chaowangnk1) | **Testing** |  AzureML DataCache internal benchmark system |
-| [Unisound](https://www.unisound.com) | [@xieydd](https://github.com/xieydd) | **Production** | Unisound ATLAS AI Platform |
-| [DOUYU](https://www.douyu.com) | [@gongcan1219](https://github.com/gongcan1219) | **Production** | DOUYU AI Platform |
--- a/archived/docs/cli/arena.md
+++ b/archived/docs/cli/arena.md
@ -1,40 +0,0 @@
-## arena
-
-arena is the command line interface to Arena
-
-### Synopsis
-
-arena is the command line interface to Arena
-
-```
-arena [flags]
-```
-
-### Options
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-  -h, --help                     help for arena
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena completion](arena_completion.md)	 - output shell completion code for the specified shell (bash or zsh)
-* [arena data](arena_data.md)	 - manage data.
-* [arena delete](arena_delete.md)	 - delete a training job and its associated pods
-* [arena get](arena_get.md)	 - display details of a training job
-* [arena list](arena_list.md)	 - list all the training jobs
-* [arena logs](arena_logs.md)	 - print the logs for a task of the training job
-* [arena logviewer](arena_logviewer.md)	 - display Log Viewer URL of a training job
-* [arena prune](arena_prune.md)	 - prune history job
-* [arena serve](arena_serve.md)	 - Serve a job.
-* [arena submit](arena_submit.md)	 - Submit a job.
-* [arena top](arena_top.md)	 - Display Resource (GPU) usage.
-* [arena version](arena_version.md)	 - Print version information
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_completion.md
+++ b/archived/docs/cli/arena_completion.md
@ -1,43 +0,0 @@
-## arena completion
-
-output shell completion code for the specified shell (bash or zsh)
-
-### Synopsis
-
-Write bash or zsh shell completion code to standard output.
-
-For bash, ensure you have bash completions installed and enabled.
-To access completions in your current shell, run
-$ source <(arena completion bash)
-Alternatively, write it to a file and source in .bash_profile
-
-For zsh, output to a file in a directory referenced by the $fpath shell
-variable.
-
-
-```
-arena completion SHELL [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for completion
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_data.md
+++ b/archived/docs/cli/arena_data.md
@ -1,39 +0,0 @@
-## arena data
-
-manage data.
-
-### Synopsis
-
-manage data volumes.
-
-Available Commands:
-  list,ls              List the data volumes.
-    
-
-```
-arena data [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for data
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-* [arena data list](arena_data_list.md)	 - list all the data volume.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_data_list.md
+++ b/archived/docs/cli/arena_data_list.md
@ -1,35 +0,0 @@
-## arena data list
-
-list all the data volume.
-
-### Synopsis
-
-list all the data volume.
-
-```
-arena data list [flags]
-```
-
-### Options
-
-```
-      --allNamespaces   show all the namespaces
-  -h, --help            help for list
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena data](arena_data.md)	 - manage data.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_delete.md
+++ b/archived/docs/cli/arena_delete.md
@ -1,35 +0,0 @@
-## arena delete
-
-delete a training job and its associated pods
-
-### Synopsis
-
-delete a training job and its associated pods
-
-```
-arena delete a training job [flags]
-```
-
-### Options
-
-```
-  -h, --help          help for delete
-      --type string   The training type to delete, the possible option is tfjob, mpijob, horovodjob or standalonejob. (optional)
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_get.md
+++ b/archived/docs/cli/arena_get.md
@ -1,37 +0,0 @@
-## arena get
-
-display details of a training job
-
-### Synopsis
-
-display details of a training job
-
-```
-arena get training job [flags]
-```
-
-### Options
-
-```
-  -e, --events          Specify if show pending pod's events.
-  -h, --help            help for get
-  -o, --output string   Output format. One of: json|yaml|wide
-      --type string     The training type to delete, the possible option is tfjob, mpijob, horovodjob or standalonejob. (optional)
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_list.md
+++ b/archived/docs/cli/arena_list.md
@ -1,35 +0,0 @@
-## arena list
-
-list all the training jobs
-
-### Synopsis
-
-list all the training jobs
-
-```
-arena list [flags]
-```
-
-### Options
-
-```
-      --allNamespaces   show all the namespaces
-  -h, --help            help for list
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_logs.md
+++ b/archived/docs/cli/arena_logs.md
@ -1,41 +0,0 @@
-## arena logs
-
-print the logs for a task of the training job
-
-### Synopsis
-
-print the logs for a task of the training job
-
-```
-arena logs training job [flags]
-```
-
-### Options
-
-```
-  -f, --follow              Specify if the logs should be streamed.
-  -h, --help                help for logs
-  -i, --instance string     Specify the task instance to get log
-      --since string        Only return logs newer than a relative duration like 5s, 2m, or 3h. Defaults to all logs. Only one of since-time / since may be used.
-      --since-time string   Only return logs after a specific date (RFC3339). Defaults to all logs. Only one of since-time / since may be used.
-      --tail int            Lines of recent log file to display. Defaults to -1 with no selector, showing all log lines otherwise 10, if a selector is provided. (default -1)
-      --timestamps          Include timestamps on each line in the log output
-      --type string         The training type to show logging, the possible option is tfjob, mpijob, horovodjob or standalonejob. (optional)
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_logviewer.md
+++ b/archived/docs/cli/arena_logviewer.md
@ -1,34 +0,0 @@
-## arena logviewer
-
-display Log Viewer URL of a training job
-
-### Synopsis
-
-display Log Viewer URL of a training job
-
-```
-arena logviewer job [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for logviewer
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_prune.md
+++ b/archived/docs/cli/arena_prune.md
@ -1,35 +0,0 @@
-## arena prune
-
-prune history job
-
-### Synopsis
-
-prune history job
-
-```
-arena prune history job [flags]
-```
-
-### Options
-
-```
-  -h, --help             help for prune
-  -s, --since duration   Clean job that live longer than relative duration like 5s, 2m, or 3h. (default -1ns)
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_serve.md
+++ b/archived/docs/cli/arena_serve.md
@ -1,43 +0,0 @@
-## arena serve
-
-Serve a job.
-
-### Synopsis
-
-serve a job.
-
-Available Commands:
-  tensorflow,tf  Submit a TensorFlow Serving Job.
-  tensorrt,trt   Submit a TensorRT Job
-
-```
-arena serve [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for serve
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-* [arena serve delete](arena_serve_delete.md)	 - delete a serving job and its associated pods
-* [arena serve list](arena_serve_list.md)	 - list all the serving jobs
-* [arena serve tensorflow](arena_serve_tensorflow.md)	 - Submit tensorflow serving job to deploy and serve machine learning models.
-* [arena serve tensorrt](arena_serve_tensorrt.md)	 - Submit tensorRT inference serving job to deploy and serve machine learning models.
-* [arena serve traffic-split](arena_serve_traffic-split.md)	 - Adjust traffic routing dynamically for tfserving jobs
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_serve_delete.md
+++ b/archived/docs/cli/arena_serve_delete.md
@ -1,34 +0,0 @@
-## arena serve delete
-
-delete a serving job and its associated pods
-
-### Synopsis
-
-delete a serving job and its associated pods
-
-```
-arena serve delete a serving job [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for delete
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena serve](arena_serve.md)	 - Serve a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_serve_list.md
+++ b/archived/docs/cli/arena_serve_list.md
@ -1,34 +0,0 @@
-## arena serve list
-
-list all the serving jobs
-
-### Synopsis
-
-list all the serving jobs
-
-```
-arena serve list [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for list
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena serve](arena_serve.md)	 - Serve a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_serve_tensorflow.md
+++ b/archived/docs/cli/arena_serve_tensorflow.md
@ -1,54 +0,0 @@
-## arena serve tensorflow
-
-Submit tensorflow serving job to deploy and serve machine learning models.
-
-### Synopsis
-
-Submit tensorflow serving job to deploy and serve machine learning models.
-
-```
-arena serve tensorflow [flags]
-```
-
-### Options
-
-```
-      --command string           the command will inject to container's command.
-      --cpu string               the request cpu of each replica to run the serve.
-  -d, --data stringArray         specify the trained models datasource to mount for serving, like <name_of_datasource>:<mount_point_on_job>
-      --enableIstio              enable Istio for serving or not (disable Istio by default)
-  -e, --envs stringArray         the environment variables
-      --exposeService            expose service using Istio gateway for external access or not (not expose by default)
-      --gpumemory int            the limit GPU memory of each replica to run the serve.
-      --gpus int                 the limit GPU count of each replica to run the serve.
-  -h, --help                     help for tensorflow
-      --image string             the docker image name of serve job, and the default image is tensorflow/serving:latest (default "tensorflow/serving:latest")
-      --imagePullPolicy string   the policy to pull the image, and the default policy is IfNotPresent (default "IfNotPresent")
-      --memory string            the request memory of each replica to run the serve.
-      --modelConfigFile string   Corresponding with --model_config_file in tensorflow serving
-      --modelName string         the model name for serving
-      --modelPath string         the model path for serving in the container
-      --port int                 the port of tensorflow gRPC listening port (default 8500)
-      --replicas int             the replicas number of the serve job. (default 1)
-      --restfulPort int          the port of tensorflow RESTful listening port (default 8501)
-      --servingName string       the serving name
-      --servingVersion string    the serving version
-      --versionPolicy string     support latest, latest:N, specific:N, all
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena serve](arena_serve.md)	 - Serve a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_serve_tensorrt.md
+++ b/archived/docs/cli/arena_serve_tensorrt.md
@ -1,55 +0,0 @@
-## arena serve tensorrt
-
-Submit tensorRT inference serving job to deploy and serve machine learning models.
-
-### Synopsis
-
-Submit tensorRT inference serving job to deploy and serve machine learning models.
-
-```
-arena serve tensorrt [flags]
-```
-
-### Options
-
-```
-      --allowMetrics             Open Metric
-      --command string           the command will inject to container's command.
-      --cpu string               the request cpu of each replica to run the serve.
-  -d, --data stringArray         specify the trained models datasource to mount for serving, like <name_of_datasource>:<mount_point_on_job>
-      --enableIstio              enable Istio for serving or not (disable Istio by default)
-  -e, --envs stringArray         the environment variables
-      --exposeService            expose service using Istio gateway for external access or not (not expose by default)
-      --gpumemory int            the limit GPU memory of each replica to run the serve.
-      --gpus int                 the limit GPU count of each replica to run the serve.
-      --grpcPort int             the port of grpc serving server (default 8001)
-  -h, --help                     help for tensorrt
-      --httpPort int             the port of http serving server (default 8000)
-      --image string             the docker image name of serve job, and the default image is registry.cn-beijing.aliyuncs.com/xiaozhou/tensorrt-serving:18.12-py3 (default "registry.cn-beijing.aliyuncs.com/xiaozhou/tensorrt-serving:18.12-py3")
-      --imagePullPolicy string   the policy to pull the image, and the default policy is IfNotPresent (default "IfNotPresent")
-      --memory string            the request memory of each replica to run the serve.
-      --metricPort int           the port of metrics server (default 8002)
-      --modelName string         the model name for serving
-      --modelPath string         the model path for serving in the container
-      --modelStore string        the path of tensorRT model path
-      --replicas int             the replicas number of the serve job. (default 1)
-      --servingName string       the serving name
-      --servingVersion string    the serving version
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena serve](arena_serve.md)	 - Serve a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_serve_traffic-router-split.md
+++ b/archived/docs/cli/arena_serve_traffic-router-split.md
@ -1,36 +0,0 @@
-## arena serve traffic-router-split
-
-Adjust traffic routing dynamically for tfserving jobs
-
-### Synopsis
-
-Adjust traffic routing dynamically for tfserving jobs
-
-```
-arena serve traffic-router-split [flags]
-```
-
-### Options
-
-```
-  -h, --help                 help for traffic-router-split
-      --servingName string   the serving name
-      --versions string      Model versions which the traffic will be routed to, e.g. [1,2,3] (default "[]")
-      --weights string       Weight percentage values for each model version which the traffic will be routed to,e.g. [70,20,10] (default "[]")
-```
-
-### Options inherited from parent commands
-
-```
-      --arenaNamespace string   The namespace of arena system service, like TFJob (default "arena-system")
-      --config string           Path to a kube config. Only required if out-of-cluster
-      --loglevel string         Set the logging level. One of: debug|info|warn|error (default "info")
-      --namespace string        the namespace of the job (default "default")
-      --pprof                   enable cpu profile
-```
-
-### SEE ALSO
-
-* [arena serve](arena_serve.md)	 - Serve a job.
-
-###### Auto generated by spf13/cobra on 7-Sep-2018
--- a/archived/docs/cli/arena_serve_traffic-split.md
+++ b/archived/docs/cli/arena_serve_traffic-split.md
@ -1,37 +0,0 @@
-## arena serve traffic-split
-
-Adjust traffic routing dynamically for tfserving jobs
-
-### Synopsis
-
-Adjust traffic routing dynamically for tfserving jobs
-
-```
-arena serve traffic-split [flags]
-```
-
-### Options
-
-```
-  -h, --help                     help for traffic-split
-      --servingName string       the serving name
-      --servingVersions string   Model versions which the traffic will be routed to, e.g. 1,2,3
-      --weights string           Weight percentage values for each model version which the traffic will be routed to,e.g. 70,20,10
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena serve](arena_serve.md)	 - Serve a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_submit.md
+++ b/archived/docs/cli/arena_submit.md
@ -1,47 +0,0 @@
-## arena submit
-
-Submit a job.
-
-### Synopsis
-
-Submit a job.
-
-Available Commands:
-  tfjob,tf             Submit a TFJob.
-  horovod,hj           Submit a Horovod Job.
-  mpijob,mpi           Submit a MPIJob.
-  standalonejob,sj     Submit a standalone Job.
-  tfserving,tfserving  Submit a Serving Job.
-  sparkjob,spark       Submit a Spark Job.
-
-```
-arena submit [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for submit
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-* [arena submit horovodjob](arena_submit_horovodjob.md)	 - Submit horovodjob as training job.
-* [arena submit mpijob](arena_submit_mpijob.md)	 - Submit MPIjob as training job.
-* [arena submit standalonejob](arena_submit_standalonejob.md)	 - Submit StandaloneJob as training job. And it will be deprecated soon, please use tfjob instead.
-* [arena submit tfjob](arena_submit_tfjob.md)	 - Submit TFJob as training job.
-* [arena submit sparkjob](arena_submit_sparkjob.md)	 - Submit SparkJob as training job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_submit_horovodjob.md
+++ b/archived/docs/cli/arena_submit_horovodjob.md
@ -1,51 +0,0 @@
-## arena submit horovodjob
-
-Submit horovodjob as training job.
-
-### Synopsis
-
-Submit horovodjob as training job.
-
-```
-arena submit horovodjob [flags]
-```
-
-### Options
-
-```
-  -a, --annotation stringArray   the annotations
-      --cpu string               the cpu resource to use for the training, like 1 for 1 core.
-  -d, --data stringArray         specify the datasource to mount to the job, like <name_of_datasource>:<mount_point_on_job>
-      --data-dir stringArray     the data dir. If you specify /data, it means mounting hostpath /data into container path /data
-  -e, --env stringArray          the environment variables
-      --gpus int                 the GPU count of each worker to run the training.
-  -h, --help                     help for horovodjob
-      --image string             the docker image name of training job
-      --memory string            the memory resource to use for the training, like 1Gi.
-      --name string              override name
-      --rdma                     enable RDMA
-      --retry int                retry times.
-      --sshPort int              ssh port.
-      --sync-image string        the docker image of syncImage
-      --sync-mode string         syncMode: support rsync, hdfs, git
-      --sync-source string       sync-source: for rsync, it's like 10.88.29.56::backup/data/logoRecoTrain.zip; for git, it's like https://github.com/kubeflow/tf-operator.git
-      --workers int              the worker number to run the distributed training. (default 1)
-      --working-dir string       working directory to extract the code. If using syncMode, the $workingDir/code contains the code (default "/root")
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena submit](arena_submit.md)	 - Submit a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_submit_mpijob.md
+++ b/archived/docs/cli/arena_submit_mpijob.md
@ -1,53 +0,0 @@
-## arena submit mpijob
-
-Submit MPIjob as training job.
-
-### Synopsis
-
-Submit MPIjob as training job.
-
-```
-arena submit mpijob [flags]
-```
-
-### Options
-
-```
-  -a, --annotation stringArray     the annotations
-      --cpu string                 the cpu resource to use for the training, like 1 for 1 core.
-  -d, --data stringArray           specify the datasource to mount to the job, like <name_of_datasource>:<mount_point_on_job>
-      --data-dir stringArray       the data dir. If you specify /data, it means mounting hostpath /data into container path /data
-  -e, --env stringArray            the environment variables
-      --gpus int                   the GPU count of each worker to run the training.
-  -h, --help                       help for mpijob
-      --image string               the docker image name of training job
-      --logdir string              the training logs dir, default is /training_logs (default "/training_logs")
-      --memory string              the memory resource to use for the training, like 1Gi.
-      --name string                override name
-      --rdma                       enable RDMA
-      --retry int                  retry times.
-      --sync-image string          the docker image of syncImage
-      --sync-mode string           syncMode: support rsync, hdfs, git
-      --sync-source string         sync-source: for rsync, it's like 10.88.29.56::backup/data/logoRecoTrain.zip; for git, it's like https://github.com/kubeflow/tf-operator.git
-      --tensorboard                enable tensorboard
-      --tensorboard-image string   the docker image for tensorboard (default "registry.cn-zhangjiakou.aliyuncs.com/tensorflow-samples/tensorflow:1.12.0-devel")
-      --workers int                the worker number to run the distributed training. (default 1)
-      --working-dir string         working directory to extract the code. If using syncMode, the $workingDir/code contains the code (default "/root")
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena submit](arena_submit.md)	 - Submit a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_submit_sparkjob.md
+++ b/archived/docs/cli/arena_submit_sparkjob.md
@ -1,37 +0,0 @@
-## arena submit sparkjob 
-
-Submit SparkJob as training job.
-
-### Synopsis
-
-Submit SparkJob as training job.
-
-```
-arena submit tfjob [flags]
-```
-
-### Options
-
-```
-    --image string        the docker image name of training job
-    --jar string          jar path in image
-    --main-class string   main class of your jar
-    --name string         override name
-    --workers int         the worker number to run the distributed training. (default 1)
-```
-
-### Options inherited from parent commands
-
-```
-      --arenaNamespace string   The namespace of arena system service, like TFJob (default "arena-system")
-      --config string           Path to a kube config. Only required if out-of-cluster
-      --loglevel string         Set the logging level. One of: debug|info|warn|error (default "info")
-      --namespace string        the namespace of the job (default "default")
-      --pprof                   enable cpu profile
-      --trace                   enable trace
-```
-
-### SEE ALSO
-
-* [arena submit](arena_submit.md)	 - Submit a job.
-
--- a/archived/docs/cli/arena_submit_standalonejob.md
+++ b/archived/docs/cli/arena_submit_standalonejob.md
@ -1,52 +0,0 @@
-## arena submit standalonejob(deprecated)
-
-**Warning: standalonejob has been deprecated,please use [tfjob](../userguide/1-tfjob-standalone.md) instead.**
-
-Submit StandaloneJob as training job. And it will be deprecated soon, please use tfjob instead.
-
-### Synopsis
-
-Submit StandaloneJob as training job. And it will be deprecated soon, please use tfjob instead.
-
-```
-arena submit standalonejob [flags]
-```
-
-### Options
-
-```
-  -a, --annotation stringArray   the annotations
-      --cpu string               the cpu resource to use for the training, like 1 for 1 core.
-  -d, --data stringArray         specify the datasource to mount to the job, like <name_of_datasource>:<mount_point_on_job>
-      --data-dir stringArray     the data dir. If you specify /data, it means mounting hostpath /data into container path /data
-  -e, --env stringArray          the environment variables
-      --gpus int                 the GPU count of each worker to run the training.
-  -h, --help                     help for standalonejob
-      --image string             the docker image name of training job
-      --memory string            the memory resource to use for the training, like 1Gi.
-      --name string              override name
-      --rdma                     enable RDMA
-      --retry int                retry times.
-      --sync-image string        the docker image of syncImage
-      --sync-mode string         syncMode: support rsync, hdfs, git
-      --sync-source string       sync-source: for rsync, it's like 10.88.29.56::backup/data/logoRecoTrain.zip; for git, it's like https://github.com/kubeflow/tf-operator.git
-      --workers int              the worker number to run the distributed training. (default 1)
-      --working-dir string       working directory to extract the code. If using syncMode, the $workingDir/code contains the code (default "/root")
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena submit](arena_submit.md)	 - Submit a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_submit_tfjob.md
+++ b/archived/docs/cli/arena_submit_tfjob.md
@ -1,68 +0,0 @@
-## arena submit tfjob
-
-Submit TFJob as training job.
-
-### Synopsis
-
-Submit TFJob as training job.
-
-```
-arena submit tfjob [flags]
-```
-
-### Options
-
-```
-  -a, --annotation stringArray     the annotations
-      --chief                      enable chief, which is required for estimator.
-      --chief-cpu string           the cpu resource to use for the Chief, like 1 for 1 core.
-      --chief-memory string        the memory resource to use for the Chief, like 1Gi.
-      --chief-port int             the port of the chief.
-      --clean-task-policy string   How to clean tasks after Training is done, only support Running, None. (default "Running")
-  -d, --data stringArray           specify the datasource to mount to the job, like <name_of_datasource>:<mount_point_on_job>
-      --data-dir stringArray       the data dir. If you specify /data, it means mounting hostpath /data into container path /data
-  -e, --env stringArray            the environment variables
-      --evaluator                  enable evaluator, which is optional for estimator.
-      --evaluator-cpu string       the cpu resource to use for the evaluator, like 1 for 1 core.
-      --evaluator-memory string    the memory resource to use for the evaluator, like 1Gi.
-      --gpus int                   the GPU count of each worker to run the training.
-  -h, --help                       help for tfjob
-      --image string               the docker image name of training job
-      --logdir string              the training logs dir, default is /training_logs (default "/training_logs")
-      --name string                override name
-      --ps int                     the number of the parameter servers.
-      --ps-cpu string              the cpu resource to use for the parameter servers, like 1 for 1 core.
-      --ps-image string            the docker image for tensorflow workers
-      --ps-memory string           the memory resource to use for the parameter servers, like 1Gi.
-      --ps-port int                the port of the parameter server.
-      --rdma                       enable RDMA
-      --retry int                  retry times.
-      --sync-image string          the docker image of syncImage
-      --sync-mode string           syncMode: support rsync, hdfs, git
-      --sync-source string         sync-source: for rsync, it's like 10.88.29.56::backup/data/logoRecoTrain.zip; for git, it's like https://github.com/kubeflow/tf-operator.git
-      --tensorboard                enable tensorboard
-      --tensorboard-image string   the docker image for tensorboard (default "registry.cn-zhangjiakou.aliyuncs.com/tensorflow-samples/tensorflow:1.12.0-devel")
-      --worker-cpu string          the cpu resource to use for the worker, like 1 for 1 core.
-      --worker-image string        the docker image for tensorflow workers
-      --worker-memory string       the memory resource to use for the worker, like 1Gi.
-      --worker-port int            the port of the worker.
-      --workers int                the worker number to run the distributed training. (default 1)
-      --working-dir string         working directory to extract the code. If using syncMode, the $workingDir/code contains the code (default "/root")
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena submit](arena_submit.md)	 - Submit a job.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_top.md
+++ b/archived/docs/cli/arena_top.md
@ -1,41 +0,0 @@
-## arena top
-
-Display Resource (GPU) usage.
-
-### Synopsis
-
-Display Resource (GPU) usage.
-
-Available Commands:
-  node        Display Resource (GPU) usage of nodes
-  job         Display Resource (GPU) usage of pods
-    
-
-```
-arena top [flags]
-```
-
-### Options
-
-```
-  -h, --help   help for top
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-* [arena top job](arena_top_job.md)	 - Display Resource (GPU) usage of jobs.
-* [arena top node](arena_top_node.md)	 - Display Resource (GPU) usage of nodes.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_top_job.md
+++ b/archived/docs/cli/arena_top_job.md
@ -1,37 +0,0 @@
-## arena top job
-
-Display Resource (GPU) usage of jobs.
-
-### Synopsis
-
-Display Resource (GPU) usage of jobs.
-
-```
-arena top job [flags]
-```
-
-### Options
-
-```
-      --allNamespaces     show all the namespaces
-  -h, --help              help for job
-  -i, --instance string   Display instance top info
-  -r, --refresh           Display continuously
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena top](arena_top.md)	 - Display Resource (GPU) usage.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_top_node.md
+++ b/archived/docs/cli/arena_top_node.md
@ -1,35 +0,0 @@
-## arena top node
-
-Display Resource (GPU) usage of nodes.
-
-### Synopsis
-
-Display Resource (GPU) usage of nodes.
-
-```
-arena top node [flags]
-```
-
-### Options
-
-```
-  -d, --details   Display details
-  -h, --help      help for node
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena top](arena_top.md)	 - Display Resource (GPU) usage.
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/cli/arena_version.md
+++ b/archived/docs/cli/arena_version.md
@ -1,35 +0,0 @@
-## arena version
-
-Print version information
-
-### Synopsis
-
-Print version information
-
-```
-arena version [flags]
-```
-
-### Options
-
-```
-  -h, --help    help for version
-      --short   print just the version number
-```
-
-### Options inherited from parent commands
-
-```
-      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
-      --config string            Path to a kube config. Only required if out-of-cluster
-      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
-  -n, --namespace string         the namespace of the job (default "default")
-      --pprof                    enable cpu profile
-      --trace                    enable trace
-```
-
-### SEE ALSO
-
-* [arena](arena.md)	 - arena is the command line interface to Arena
-
-###### Auto generated by spf13/cobra on 24-Apr-2019
--- a/archived/docs/dev/dev_tf_plugin.md
+++ b/archived/docs/dev/dev_tf_plugin.md
@ -1,50 +0,0 @@
-## The TFJob plugin framework
-
-If you'd like to customize or enhance the TFJob with your own chart or code.
-
-
-## Developer Workflow
-
-### Step 1: Implement the following function (optional)
-
-```
-// Customized runtime for tf training training
-type tfRuntime interface {
-	// check the tfjob args
-	check(tf *submitTFJobArgs) (err error)
-	// transform the tfjob
-	transform(tf *submitTFJobArgs) (err error)
-	
-	getChartName() string
-}
-```
-
-You can refer the implmentation of default tf runtime [../../cmd/arena/commands/training_plugin_interface.go](training_plugin_interface.go)
-
-
-### Step 2. Create your own chart
-
-If you don't need to create your code for `check` or `transform`, you can create the chart in the same directory of tfjob, mpijob. For example, the chart name is `mock`.
-
-```
-cd /charts
-cp -r tfjob mock
-```
-
-## User Workflow
-
-Just run with the command by specifying annotation `runtime={your runtime}`
-
-```
-arena submit tf \
--name=test \
--annotation="runtime=mock" \
--workers=1 \
--chief \
--chief-cpu=4 \
--evaluator \
--evaluator-cpu=4 \
--worker-cpu=2 \
-"python test.py"
-```
-
--- a/archived/docs/installation/INSTALL_FROM_BINARY.md
+++ b/archived/docs/installation/INSTALL_FROM_BINARY.md
@ -1,118 +0,0 @@
-## Setup
-
-This documentation assumes you have a Kubernetes cluster already available.
-
-If you need help setting up a Kubernetes cluster please refer to [Kubernetes Setup](https://kubernetes.io/docs/setup/).
-
-If you want to use GPUs, be sure to follow the Kubernetes [instructions for enabling GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).
-
-Arena doesn't have to run can be run within Kubernetes cluster. It can also be run in your laptop. If you can run `kubectl` to manage the Kubernetes cluster there, you can also use `arena`  to manage Training Jobs.
-
-### Requirements
-
-  * Linux OS
-  * Kubernetes >= 1.11, kubectl >= 1.11
-  * helm version [v2.14.1](https://docs.helm.sh/using_helm/#installing-helm) or later 
-  * tiller with ths same version of helm should be also installed (https://docs.helm.sh/using_helm/#installing-tiller)
-
-### Steps
-
-1\. Prepare kubeconfig file by using `export KUBECONFIG=/etc/kubernetes/admin.conf` or creating a `~/.kube/config`
-
-2\. Download the latest installer from [Release Page](https://github.com/kubeflow/arena/releases), and rename it to `arena-installer.tar.gz`
-
-3\. Untar the installer package
-
-```
-# tar -xvf arena-installer.tar.gz 
-```
-
-4\. Setup Environment Varaibles for customization
-
-4.1\. If you'd like to train and serving in hostNetwork
-
-```
-export USE_HOSTNETWORK=true
-```
-
-4.2\. If you'd like to customize Kubernetes namespace of arena infrastructure  
-
-```
-export NAMESPACE={your namespace}
-```
-
-4.3\. If you'd like to use your private docker registry instead of `ACR(Alibaba Cloud Container Registry)`:
-
-```
-export DOCKER_REGISTRY={your docker registry}
-```
-
-4.4\. If you'd like to deploy prometheus in `ACK(Alibaba Container Service for Kubernetes)`
-
-```
-export USE_PROMETHEUS=true
-export PLATFORM=ack
-```
-
-4.5\. If you'd like to use Cloud loadbalancer
-
-```
-export USE_LOADBALANCER=true
-```
-
-5\. Install arena
-
-```
-# cd arena-installer
-# sudo ./install.sh
-```
-
-6\. Enable shell autocompletion
-
-On Linux, please use bash
-
-On CentOS Linux, you may need to install the bash-completion package which is not installed by default.
-
-```
-yum install bash-completion -y
-```
-
-On Debian or Ubuntu Linux you may need to install with 
-
-```
-apt-get install bash-completion
-```
-
-To add arena autocompletion to your current shell, run `source <(arena completion bash)`.
-
-On MacOS, please use bash
-
-You can install it with Homebrew:
-
-```
-brew install bash-completion@2
-```
-
-To add arena autocompletion to your profile, so it is automatically loaded in future shells run:
-
-```
-echo "source <(arena completion bash)" >> ~/.bashrc
-chmod u+x ~/.bashrc
-```
-
-For MacOS, add the following to your `~/.bashrc` file: 
-
-```
-echo "source $(brew --prefix)/etc/profile.d/bash_completion.sh" >> ~/.bashrc
-```
-
-Then you can use [tab] to auto complete the command
-
-```
-# arena list
-NAME            STATUS   TRAINER  AGE  NODE
-tf1             PENDING  TFJOB    0s   N/A
-caffe-1080ti-1  RUNNING  HOROVOD  45s  192.168.1.120
-# arena get [tab]
-caffe-1080ti-1  tf1
-```
--- a/archived/docs/installation/INSTALL_FROM_SOURCE.md
+++ b/archived/docs/installation/INSTALL_FROM_SOURCE.md
@ -1,157 +0,0 @@
-## Setup
-
-This documentation assumes you have a Kubernetes cluster already available.
-
-If you need help setting up a Kubernetes cluster please refer to [Kubernetes Setup](https://kubernetes.io/docs/setup/).
-
-If you want to use GPUs, be sure to follow the Kubernetes [instructions for enabling GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).
-
-Arena doesn't have to run can be run within Kubernetes cluster. It can also be run in your laptop. If you can run `kubectl` to manage the Kubernetes cluster there, you can also use `arena`  to manage Training Jobs.
-
-### Requirements
-
-  * Kubernetes >= 1.11, kubectl >= 1.11
-  * helm version [v2.14.1](https://docs.helm.sh/using_helm/#installing-helm) or later 
-  * tiller with ths same version of helm should be also installed (https://docs.helm.sh/using_helm/#installing-tiller)
-
-### Steps
-
-1\. Prepare kubeconfig file by using `export KUBECONFIG=/etc/kubernetes/admin.conf` or creating a `~/.kube/config`
-
-2\. Install kubectl client
-
-Please follow [kubectl installation guide](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
-
-3\. Install Helm client
-
- Download Helm client from [github.com](https://github.com/helm/helm/releases)  
- Unpack it (tar -zxvf helm-v2.14.1-linux-amd64.tgz)
- Find the `helm` binary in the unpacked directory, and move it to its desired destination (mv linux-amd64/helm /usr/local/bin/arena-helm)
-
-Then run `helm list` to check if the the kubernetes can be managed successfully by helm.
-
-```
-# arena-helm list
-# echo $?
-0
-```
-
-4\. Download the charts
-
-```
-mkdir /charts
-git clone https://github.com/kubeflow/arena.git
-cp -r arena/charts/* /charts
-```
-
-5\. Install TFJob Controller
-
-```
-kubectl create -f arena/kubernetes-artifacts/jobmon/jobmon-role.yaml
-kubectl create -f arena/kubernetes-artifacts/tf-operator/tf-crd.yaml
-kubectl create -f arena/kubernetes-artifacts/tf-operator/tf-operator.yaml
-```
-
-6\. Install Dashboard
-
-```
-kubectl create -f arena/kubernetes-artifacts/dashboard/dashboard.yaml
-```
-
-7\. Install MPIJob Controller
-
-```
-kubectl create -f arena/kubernetes-artifacts/mpi-operator/mpi-operator.yaml
-```
-
-8\. Build arena
-
-Prerequisites:
-
- Go >= 1.8
-
-```
-mkdir -p $(go env GOPATH)/src/github.com/kubeflow
-cd $(go env GOPATH)/src/github.com/kubeflow
-git clone https://github.com/kubeflow/arena.git
-cd arena
-make
-```
-
-`arena` binary is located in directory `arena/bin`. You may want add the directory to `$PATH`.
-
-
-9\. Install and configure kube-arbitrator for gang scheduling(optional)
-
-```
-kubectl create -f arena/kubernetes-artifacts/kube-batchd/kube-batched.yaml
-```
-
-10\. Enable shell autocompletion
-
-On Linux, please use bash
-
-On CentOS Linux, you may need to install the bash-completion package which is not installed by default.
-
-```
-yum install bash-completion -y
-```
-
-To add arena autocompletion to your current shell, run source <(arena completion bash).
-
-To add arena autocompletion to your profile, so it is automatically loaded in future shells run:
-
-```
-echo "source <(arena completion bash)" >> ~/.bashrc
-```
-
-Then you can use [tab] to auto complete the command
-
-```
-# arena list
-NAME            STATUS   TRAINER  AGE  NODE
-tf1             PENDING  TFJOB    0s   N/A
-caffe-1080ti-1  RUNNING  HOROVOD  45s  192.168.1.120
-# arena get [tab]
-caffe-1080ti-1  tf1
-```
-
-
-11\. Enable Host network for training (optional)
-
-The training is not `useHostNetwork` by default. If you'd like to run the training in HostNetwork. You can run the command below:
-
-```
-find /charts/ -name values.yaml | xargs sed -i "/useHostNetwork/s/false/true/g"
-```
-
-12\. Enable Loadbalancer in the public cloud (optional)
-
- Kubernetes can be run on AWS, GCE, Azure and Alibaba Cloud, and `LoadBalancer` is supported in their cloud provider. If you want to access tensorboard on the internet directly, you can run the command below:
-
-
-```
-find /charts/ -name "*.yaml" | xargs sed -i "s/NodePort/LoadBalancer/g"
-```
-
-> Warning: it's not encouraged to expose the service to the internet, because the service can be attacked by hacker easily.
-
-
-13\. Enable Ingress in the public cloud (optional)
-
-If you have ingress controller configured, you are able to access tensorboard through ingress. You can run the command below:
-
-```
-find /charts/ -name values.yaml | xargs sed -i "/ingress/s/false/true/g"
-```
-
-> Warning: it's not encouraged to expose the service to the internet, because the service can be attacked by hacker easily.
-
-
-14\. Change imagePullPolicy from `Always` to `IfNotPresent` (optional)
-
-```
-find /charts/ -name values.yaml| xargs sed -i "s/Always/IfNotPresent/g"
-```
-
-> Warning: this may cause the docker images are not up to date if it's already downloaded in node.
--- a/archived/docs/installation_cn/README.md
+++ b/archived/docs/installation_cn/README.md
@ -1,154 +0,0 @@
-## 部署
-
-本文档假设您已经有可用的 Kubernetes 集群。
-
-如果您需要有关 Kubernetes 集群设置的帮助，请参阅 [Kubernetes 设置](https://kubernetes.io/docs/setup/)。
-
-如果您希望使用 GPU，请务必按照 Kubernetes [GPU 启用说明](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/) 操作。
-
-Arena 并非必需在 Kubernetes 集群内运行。它也可以在您的笔记本电脑中运行。如果您可以运行 `kubectl` 以管理 Kubernetes 集群，那么也可以使用 `arena` 管理训练作业。
-
-### 要求
-
-  * Kubernetes >= 1.11, kubectl >= 1.11
-  * helm 版本 [v2.14.1](https://docs.helm.sh/using_helm/#installing-helm) 或更新版本 
-  * 此外还要部署与 helm 版本相同的 tiller(https://docs.helm.sh/using_helm/#installing-tiller)
-
-### 步骤
-
-1\.通过使用 `export KUBECONFIG=/etc/kubernetes/admin.conf` 或创建一个 `~/.kube/config` 来准备 kubeconfig 文件
-
-2\.安装 kubectl 客户端
-
-请按照 [kubectl 安装指南] 操作(https://kubernetes.io/docs/tasks/tools/install-kubectl/)
-
-3\.安装 Helm 客户端
-
- 从 [github.com] 下载 Helm 客户端(https://github.com/helm/helm/releases)  
- 将下载到的文件解压缩 (tar -zxvf helm-v2.8.2-linux-amd64.tgz)
- 在解压缩目录中找到 `helm` 二进制文件，将其移到所需目标位置 (mv linux-amd64/helm /usr/local/bin/arena-helm)
-
-然后运行 `helm list` 以检查 helm 能否成功管理 kubernetes。
-
-```
-#helm list
-#echo $?
-0
-```
-
-4\.下载 Chart
-
-```
-mkdir /charts
-git clone https://github.com/kubeflow/arena.git
-cp -r arena/charts/* /charts
-```
-
-5\.安装 TFJob 控制器
-
-```
-kubectl create -f arena/kubernetes-artifacts/jobmon/jobmon-role.yaml
-kubectl create -f arena/kubernetes-artifacts/tf-operator/tf-crd.yaml
-kubectl create -f arena/kubernetes-artifacts/tf-operator/tf-operator.yaml
-```
-
-6\.安装控制台 (可选)
-
-```
-kubectl create -f arena/kubernetes-artifacts/dashboard/dashboard.yaml
-```
-
-7\.安装 MPIJob 控制器
-
-```
-kubectl create -f arena/kubernetes-artifacts/mpi-operator/mpi-operator.yaml
-```
-
-8\.安装 arena
-
-先决条件：
-
- Go >= 1.8
-
-```
-mkdir -p $(go env GOPATH)/src/github.com/kubeflow
-cd $(go env GOPATH)/src/github.com/kubeflow
-git clone https://github.com/kubeflow/arena.git
-cd arena
-make
-```
-
-`arena` 二进制文件位于 `arena/bin` 目录下。您可能希望将目录添加到 `$PATH`。
-
-
-9\.安装并为群调度配置 kube-arbitrator（可选）
-
-```
-kubectl create -f arena/kubernetes-artifacts/kube-batchd/kube-batched.yaml
-```
-
-10\.启用 shell 自动完成
-
-在 Linux 上，请使用 bash
-
-在 CentOS Linux 上，您可能需要安装默认并未安装的 bash-completion 包。
-
-```
-yum install bash-completion -y
-```
-
-要为当前 shell 添加 arena 自动完成，请运行 source <(arena completion bash)。
-
-通过如下方法向您的配置文件添加 arena 自动完成功能，以便将来 shell 运行时可以自动加载此功能：
-
-```
-echo "source <(arena completion bash)" >> ~/.bashrc
-```
-
-然后，你可以使用 [TAB] 来自动完成命令
-
-```
-#arena list
-NAME STATUS TRAINER AGE NODE
-tf1 PENDING TFJOB 0s N/A
-caffe-1080ti-1 RUNNING HOROVOD 45s 192.168.1.120
-#arena get [tab]
-caffe-1080ti-1 tf1
-```
-
-
-11\.为训练启用主机网络（可选）
-
-默认情况下，训练并非 `useHostNetwork`。如果您希望在 HostNetwork 中运行训练。可以运行如下命令：
-
-```
-find /charts/ -name values.yaml | xargs sed -i "/useHostNetwork/s/false/true/g"
-```
-
-12\.在公共云中启用 Loadbalancer
-
- Kubernetes 可在 AWS、GCE、Azure 和阿里云中运行，其云提供商支持 `LoadBalancer`。如果您希望在互联网上直接访问 tensorboard，可以运行如下代码：
-
-```
-find /charts/ -name "*.yaml" | xargs sed -i "s/NodePort/LoadBalancer/g"
-```
-
-> 警告：我们不鼓励将服务公开给互联网，因为这种做法会导致服务受黑客攻击。
-
-13\. 在公共云中启用 Ingress
-
-Kubernetes 可在 AWS、GCE、Azure 和阿里云中运行，其云提供商支持 `Ingress`。如果您希望在互联网上直接通过统一入口访问 tensorboard，可以运行如下代码：
-
-```
-find /charts/ -name values.yaml | xargs sed -i "/ingress/s/false/true/g"
-```
-
-> 警告：我们不鼓励将服务公开给互联网，因为这种做法会导致服务受黑客攻击。
-
-14\. 将 imagePullPolicy 策略由 `Always` 修改为 `IfNotPresent` (可选)
-
-```
-find /charts/ -name values.yaml| xargs sed -i "s/Always/IfNotPresent/g"
-```
-
-> 警告: 这会导致容器镜像可能不是最新更新版本。
--- a/archived/docs/userguide/1-tfjob-logviewer.jpg
+++ b/archived/docs/userguide/1-tfjob-logviewer.jpg
--- a/archived/docs/userguide/1-tfjob-standalone.md
+++ b/archived/docs/userguide/1-tfjob-standalone.md
@ -1,138 +0,0 @@
-
-Here is an example how you can use `Arena` for the machine learning training. It will download the source code from git url.
-
-1. the first step is to check the available resources
-
-```
-arena top node
-NAME                                IPADDRESS      ROLE    GPU(Total)  GPU(Allocated)
-i-j6c68vrtpvj708d9x6j0  192.168.1.116  master  0           0
-i-j6c8ef8d9sqhsy950x7x  192.168.1.119  worker  1           0
-i-j6c8ef8d9sqhsy950x7y  192.168.1.120  worker  1           0
-i-j6c8ef8d9sqhsy950x7z  192.168.1.118  worker  1           0
-i-j6ccue91mx9n2qav7qsm  192.168.1.115  master  0           0
-i-j6ce09gzdig6cfcy1lwr  192.168.1.117  master  0           0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-0/3 (0%)
-```
-
-There are 3 available nodes with GPU for running training jobs.
-
-
-2\. Now we can submit a training job with `arena`, it will download the source code from github
-
-```
-# arena submit tf \
-             --name=tf-git \
-             --gpus=1 \
-             --image=tensorflow/tensorflow:1.5.0-devel-gpu \
-             --sync-mode=git \
-             --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
-configmap/tf-git-tfjob created
-configmap/tf-git-tfjob labeled
-tfjob.kubeflow.org/tf-git created
-INFO[0000] The Job tf-git has been submitted successfully
-INFO[0000] You can run `arena get tf-git --type tfjob` to check the job status
-```
-
-> the source code will be downloaded and extracted to the directory `code/` of the working directory. The default working directory is `/root`, you can also specify by using `--workingDir`. Also, you may specify the branch you are pulling code from by addding `--env GIT_SYNC_BRANCH=main` to the paramasters while submitting the job.
-> If you are using the private git repo, you can use the following command:
-
-```
-# arena submit tf \
-             --name=tf-git \
-             --gpus=1 \
-             --image=tensorflow/tensorflow:1.5.0-devel-gpu \
-             --syncMode=git \
-             --syncSource=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-             --env=GIT_SYNC_USERNAME=yourname \
-             --env=GIT_SYNC_PASSWORD=yourpwd \
-             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py"
-```
-
-Notice: `arena` is using [git-sync](https://github.com/kubernetes/git-sync/blob/master/cmd/git-sync/main.go) to sync up source code. You can set the environment variables defined in git-sync project.
-
-3\. List all the jobs
-
-```
-# arena list
-NAME    STATUS   TRAINER  AGE  NODE
-tf-git  RUNNING  tfjob    0s   192.168.1.120
-```
-
-4\. Check the resource usage of the job
-
-```
-# arena top job
-NAME    STATUS   TRAINER  AGE  NODE           GPU(Requests)  GPU(Allocated)
-tf-git  RUNNING  TFJOB    17s  192.168.1.120  1              1
-
-
-Total Allocated GPUs of Training Job:
-1
-
-Total Requested GPUs of Training Job:
-1
-```
-
-5\. Check the resource usage of the cluster
-
-```
-# arena top node
-NAME                    IPADDRESS      ROLE    GPU(Total)  GPU(Allocated)
-i-j6c68vrtpvj708d9x6j0  192.168.1.116  master  0           0
-i-j6c8ef8d9sqhsy950x7x  192.168.1.119  worker  1           0
-i-j6c8ef8d9sqhsy950x7y  192.168.1.120  worker  1           1
-i-j6c8ef8d9sqhsy950x7z  192.168.1.118  worker  1           0
-i-j6ccue91mx9n2qav7qsm  192.168.1.115  master  0           0
-i-j6ce09gzdig6cfcy1lwr  192.168.1.117  master  0           0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-1/3 (33%)
-```
-
-
-6\. Get the details of the specific job
-
-```
-# arena get tf-git
-NAME    STATUS   TRAINER  AGE  INSTANCE               NODE
-tf-git  RUNNING  TFJOB    5s   tf-git-tfjob-worker-0  192.168.1.120
-```
-
-7\. Check logs
-
-```
-# arena logs tf-git
-2018-07-22T23:56:20.841129509Z WARNING:tensorflow:From code/tensorflow-sample-code/tfjob/docker/mnist/main.py:119: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
-2018-07-22T23:56:20.841211064Z Instructions for updating:
-2018-07-22T23:56:20.841217002Z
-2018-07-22T23:56:20.841221287Z Future major versions of TensorFlow will allow gradients to flow
-2018-07-22T23:56:20.841225581Z into the labels input on backprop by default.
-2018-07-22T23:56:20.841229492Z
-...
-2018-07-22T23:57:11.842929868Z Accuracy at step 920: 0.967
-2018-07-22T23:57:11.842933859Z Accuracy at step 930: 0.9646
-2018-07-22T23:57:11.842937832Z Accuracy at step 940: 0.967
-2018-07-22T23:57:11.842941362Z Accuracy at step 950: 0.9674
-2018-07-22T23:57:11.842945487Z Accuracy at step 960: 0.9693
-2018-07-22T23:57:11.842949067Z Accuracy at step 970: 0.9687
-2018-07-22T23:57:11.842952818Z Accuracy at step 980: 0.9688
-2018-07-22T23:57:11.842956775Z Accuracy at step 990: 0.9649
-2018-07-22T23:57:11.842961076Z Adding run metadata for 999
-```
-
-8\. More information about the training job in the logviewer
-
-```
-# arena logviewer tf-git
-Your LogViewer will be available on:
-192.168.1.120:8080/tfjobs/ui/#/default/tf-git-tfjob
-```
-
-![](1-tfjob-logviewer.jpg)
-
-
-Congratulations! You've run the first training job with `arena` successfully. 
--- a/archived/docs/userguide/10-rdma-integration.md
+++ b/archived/docs/userguide/10-rdma-integration.md
@ -1,45 +0,0 @@
-Arena supports RDMA For distributed Training. We can allocate RDMA device for worker jobs by adding parameter `--rdma`
-
-1. Deploy rdma device plugin
-
-```
-# Deploy RDMA device plugin
-kubectl create -f kubernetes-artifacts/rdma/rdma-config.yaml
-kubectl create -f kubernetes-artifacts/rdma/device-plugin.yaml
-```
-
-2\. Label your node with infiniband device
-
-```
-# Label RDMA NODE
-kubectl label node <your node> accelerator/rdma=true
-```
-
-```
-# Check Device plugin status
-kubectl -n arena-system get ds
-NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
-rdma-sriov-dp-ds           1         1         1       1            1           accelerator/rdma=true      46d
-```
-
-3\. Enable arena RDMA config
-
-```
-find /charts/ -name values.yaml | xargs sed -i "/enableRDMA/s/false/true/g"
-```
-
-4\. Submit a Tensorflow training job using RDMA
-
-```
-# arena submit mpi --name=mpi-dist              \
-              --rdma \
-              --gpus=1              \
-              --workers=2              \
-              --image=uber/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5  \
-              --env=GIT_SYNC_BRANCH=cnn_tf_v1.9_compatible \
-              --syncMode=git \
-              --syncSource=https://github.com/tensorflow/benchmarks.git \
-              --tensorboard \
-              "mpirun python code/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64     --variable_update horovod --train_dir=/training_logs --summary_verbosity=3
-              --save_summaries_steps=10"
-```
--- a/archived/docs/userguide/11-sparkjob-distributed.md
+++ b/archived/docs/userguide/11-sparkjob-distributed.md
@ -1,201 +0,0 @@
-
-Arena supports and simplifies distributed spark job. 
-
-### 1. To run a distributed spark job, you need to specify:
- The spark job image which contains the main class jar. (required)
- Main class of your jar. (required)
- Jar path in the container.(required)
- The number of executors.(default: 1)
- The resource cpu request of driver pod (default: 1)
- The resource memory request of driver pod (default: 500m)
- The resource cpu request of executor pod (default: 1)
- The resource memory request of executor pod (default: 500m)
-
-### 2. How to create spark job image. 
-
-Arena spark job is based on spark-on-k8s-operator(https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).You can create spark job image with tool `docker-image-tool` (https://spark.apache.org/docs/latest/running-on-kubernetes.html#docker-images)
-
-### 3. How to use Arena spark job
-
-##### install spark operator 
-```$xslt
-# arena-system is the default namespace,if not exist please create it.
-kubectl create -f arena/kubernetes-artifacts/spark-operator/spark-operator.yaml
-```
-
-##### create rbac of spark job 
-The spark job need service account `spark` to create executors.
-```$xslt
-kubectl create -f arena/kubernetes-artifacts/spark-operator/spark-rbac.yaml
-```
-The default namespace is `default`. If you want to run spark job in other namespaces. You can change namespace in spark-rbac.yaml and create a new service account.
-##### submit a spark job 
-```$xslt
-arena submit sparkjob --name=demo --image=registry.aliyuncs.com/acs/spark:v2.4.0 --main-class=org.apache.spark.examples.SparkPi --jar=local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar
-```
-The result is like below.
-```$xslt
-configmap/demo-sparkjob created
-configmap/demo-sparkjob labeled
-sparkapplication.sparkoperator.k8s.io/demo created
-INFO[0005] The Job demo has been submitted successfully
-INFO[0005] You can run `arena get demo --type sparkjob` to check the job status
-```
-##### get spark job status 
-```$xslt
-arena get --type=sparkjob demo
-```
-When the job succeed,you will see the result below.
-```$xslt
-STATUS: SUCCEEDED
-NAMESPACE: default
-TRAINING DURATION: 15s
-
-NAME   STATUS     TRAINER   AGE  INSTANCE      NODE
-demo1  SUCCEEDED  SPARKJOB  1h   demo1-driver  N/A
-```
-
-##### watch log of spark job
-```$xslt
-arena logs -f demo 
-```
-You will get the log of spark driver pod.
-```$xslt
-2019-05-08T08:25:21.904409561Z ++ id -u
-2019-05-08T08:25:21.904639867Z + myuid=0
-2019-05-08T08:25:21.904649704Z ++ id -g
-2019-05-08T08:25:21.904901542Z + mygid=0
-2019-05-08T08:25:21.904909072Z + set +e
-2019-05-08T08:25:21.905241846Z ++ getent passwd 0
-2019-05-08T08:25:21.905608733Z + uidentry=root:x:0:0:root:/root:/bin/ash
-2019-05-08T08:25:21.905623028Z + set -e
-2019-05-08T08:25:21.905629226Z + '[' -z root:x:0:0:root:/root:/bin/ash ']'
-2019-05-08T08:25:21.905633894Z + SPARK_K8S_CMD=driver
-2019-05-08T08:25:21.905757494Z + case "$SPARK_K8S_CMD" in
-2019-05-08T08:25:21.90622059Z + shift 1
-2019-05-08T08:25:21.906232126Z + SPARK_CLASSPATH=':/opt/spark/jars/*'
-2019-05-08T08:25:21.906236316Z + env
-2019-05-08T08:25:21.906239651Z + grep SPARK_JAVA_OPT_
-2019-05-08T08:25:21.90624307Z + sort -t_ -k4 -n
-2019-05-08T08:25:21.906585896Z + sed 's/[^=]*=\(.*\)/\1/g'
-2019-05-08T08:25:21.906908601Z + readarray -t SPARK_EXECUTOR_JAVA_OPTS
-2019-05-08T08:25:21.906917535Z + '[' -n '' ']'
-2019-05-08T08:25:21.906999069Z + '[' -n '' ']'
-2019-05-08T08:25:21.907003871Z + PYSPARK_ARGS=
-2019-05-08T08:25:21.907006605Z + '[' -n '' ']'
-2019-05-08T08:25:21.907008951Z + R_ARGS=
-2019-05-08T08:25:21.907012105Z + '[' -n '' ']'
-2019-05-08T08:25:21.907148385Z + '[' '' == 2 ']'
-2019-05-08T08:25:21.907994286Z + '[' '' == 3 ']'
-2019-05-08T08:25:21.908014459Z + case "$SPARK_K8S_CMD" in
-2019-05-08T08:25:21.908018653Z + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
-2019-05-08T08:25:21.908023924Z + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.20.90.160 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal
-2019-05-08T08:25:23.326681135Z 2019-05-08 08:25:23 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-2019-05-08T08:25:23.829843117Z 2019-05-08 08:25:23 INFO  SparkContext:54 - Running Spark version 2.4.0
-2019-05-08T08:25:23.8529898Z 2019-05-08 08:25:23 INFO  SparkContext:54 - Submitted application: Spark Pi
-2019-05-08T08:25:23.94670344Z 2019-05-08 08:25:23 INFO  SecurityManager:54 - Changing view acls to: root
-2019-05-08T08:25:23.946735076Z 2019-05-08 08:25:23 INFO  SecurityManager:54 - Changing modify acls to: root
-2019-05-08T08:25:23.946740267Z 2019-05-08 08:25:23 INFO  SecurityManager:54 - Changing view acls groups to: 
-2019-05-08T08:25:23.946744543Z 2019-05-08 08:25:23 INFO  SecurityManager:54 - Changing modify acls groups to: 
-2019-05-08T08:25:23.946748767Z 2019-05-08 08:25:23 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
-2019-05-08T08:25:24.273960575Z 2019-05-08 08:25:24 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 7078.
-2019-05-08T08:25:24.307632934Z 2019-05-08 08:25:24 INFO  SparkEnv:54 - Registering MapOutputTracker
-2019-05-08T08:25:24.339548141Z 2019-05-08 08:25:24 INFO  SparkEnv:54 - Registering BlockManagerMaster
-2019-05-08T08:25:24.339577986Z 2019-05-08 08:25:24 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
-2019-05-08T08:25:24.340887925Z 2019-05-08 08:25:24 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
-2019-05-08T08:25:24.359682519Z 2019-05-08 08:25:24 INFO  DiskBlockManager:54 - Created local directory at /var/data/spark-118b216d-2d39-4287-ad71-5b5d7c7195c9/blockmgr-5532fd8b-64b9-492c-b94d-308b55d60a71
-2019-05-08T08:25:24.388529744Z 2019-05-08 08:25:24 INFO  MemoryStore:54 - MemoryStore started with capacity 110.0 MB
-2019-05-08T08:25:24.413347888Z 2019-05-08 08:25:24 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
-2019-05-08T08:25:24.560654618Z 2019-05-08 08:25:24 INFO  log:192 - Logging initialized @2462ms
-2019-05-08T08:25:24.654721075Z 2019-05-08 08:25:24 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
-2019-05-08T08:25:24.680943254Z 2019-05-08 08:25:24 INFO  Server:419 - Started @2586ms
-2019-05-08T08:25:24.715867156Z 2019-05-08 08:25:24 INFO  AbstractConnector:278 - Started ServerConnector@7e97551f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
-2019-05-08T08:25:24.715897312Z 2019-05-08 08:25:24 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
-2019-05-08T08:25:24.76123501Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1450078a{/jobs,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.762173789Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@534ca02b{/jobs/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.763361524Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@29a23c3d{/jobs/job,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.764374535Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6fe46b62{/jobs/job/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.764919809Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@591fd34d{/stages,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.765687152Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@61e45f87{/stages/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.766434602Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7c9b78e3{/stages/stage,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.769934319Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5491f68b{/stages/stage/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.769949155Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@736ac09a{/stages/pool,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.769966711Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ecd665{/stages/pool/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.77037559Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@45394b31{/storage,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.772696599Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1ec7d8b3{/storage/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.772709487Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3b0ca5e1{/storage/rdd,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.773014833Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5bb3131b{/storage/rdd/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.77546416Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@54dcbb9f{/environment,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.775478151Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@74fef3f7{/environment/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.775882882Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2a037324{/executors,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.780702953Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@69eb86b4{/executors/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.780717178Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@585ac855{/executors/threadDump,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.78072195Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5bb8f9e2{/executors/threadDump/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.793805533Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6a933be2{/static,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.808511998Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@378bd86d{/,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.808532751Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2189e7a7{/api,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.808537695Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@644abb8f{/jobs/job/kill,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.80854206Z 2019-05-08 08:25:24 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1a411233{/stages/stage/kill,null,AVAILABLE,@Spark}
-2019-05-08T08:25:24.808546336Z 2019-05-08 08:25:24 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://demo1-1557303918993-driver-svc.default.svc:4040
-2019-05-08T08:25:24.834767942Z 2019-05-08 08:25:24 INFO  SparkContext:54 - Added JAR file:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar at spark://demo1-1557303918993-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.4.0.jar with timestamp 1557303924832
-2019-05-08T08:25:26.274526541Z 2019-05-08 08:25:26 INFO  ExecutorPodsAllocator:54 - Going to request 1 executors from Kubernetes.
-2019-05-08T08:25:26.455658752Z 2019-05-08 08:25:26 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
-2019-05-08T08:25:26.47651031Z 2019-05-08 08:25:26 INFO  NettyBlockTransferService:54 - Server created on demo1-1557303918993-driver-svc.default.svc:7079
-2019-05-08T08:25:26.476533172Z 2019-05-08 08:25:26 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
-2019-05-08T08:25:26.503099521Z 2019-05-08 08:25:26 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, demo1-1557303918993-driver-svc.default.svc, 7079, None)
-2019-05-08T08:25:26.506168762Z 2019-05-08 08:25:26 INFO  BlockManagerMasterEndpoint:54 - Registering block manager demo1-1557303918993-driver-svc.default.svc:7079 with 110.0 MB RAM, BlockManagerId(driver, demo1-1557303918993-driver-svc.default.svc, 7079, None)
-2019-05-08T08:25:26.529524775Z 2019-05-08 08:25:26 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, demo1-1557303918993-driver-svc.default.svc, 7079, None)
-2019-05-08T08:25:26.529543725Z 2019-05-08 08:25:26 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, demo1-1557303918993-driver-svc.default.svc, 7079, None)
-2019-05-08T08:25:26.661414752Z 2019-05-08 08:25:26 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4c777e7b{/metrics/json,null,AVAILABLE,@Spark}
-2019-05-08T08:25:30.459756195Z 2019-05-08 08:25:30 INFO  KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.20.90.161:52168) with ID 1
-2019-05-08T08:25:30.534179215Z 2019-05-08 08:25:30 INFO  KubernetesClusterSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
-2019-05-08T08:25:30.679510273Z 2019-05-08 08:25:30 INFO  BlockManagerMasterEndpoint:54 - Registering block manager 172.20.90.161:36718 with 110.0 MB RAM, BlockManagerId(1, 172.20.90.161, 36718, None)
-2019-05-08T08:25:30.906713226Z 2019-05-08 08:25:30 INFO  SparkContext:54 - Starting job: reduce at SparkPi.scala:38
-2019-05-08T08:25:30.93537711Z 2019-05-08 08:25:30 INFO  DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
-2019-05-08T08:25:30.936000643Z 2019-05-08 08:25:30 INFO  DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
-2019-05-08T08:25:30.936506781Z 2019-05-08 08:25:30 INFO  DAGScheduler:54 - Parents of final stage: List()
-2019-05-08T08:25:30.938152322Z 2019-05-08 08:25:30 INFO  DAGScheduler:54 - Missing parents: List()
-2019-05-08T08:25:30.958509715Z 2019-05-08 08:25:30 INFO  DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
-2019-05-08T08:25:31.128459296Z 2019-05-08 08:25:31 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 110.0 MB)
-2019-05-08T08:25:31.172704042Z 2019-05-08 08:25:31 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 110.0 MB)
-2019-05-08T08:25:31.178025215Z 2019-05-08 08:25:31 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on demo1-1557303918993-driver-svc.default.svc:7079 (size: 1256.0 B, free: 110.0 MB)
-2019-05-08T08:25:31.182000364Z 2019-05-08 08:25:31 INFO  SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161
-2019-05-08T08:25:31.202640906Z 2019-05-08 08:25:31 INFO  DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
-2019-05-08T08:25:31.203502967Z 2019-05-08 08:25:31 INFO  TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
-2019-05-08T08:25:31.245126257Z 2019-05-08 08:25:31 INFO  TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, 172.20.90.161, executor 1, partition 0, PROCESS_LOCAL, 7878 bytes)
-2019-05-08T08:25:31.805815672Z 2019-05-08 08:25:31 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 172.20.90.161:36718 (size: 1256.0 B, free: 110.0 MB)
-2019-05-08T08:25:31.946492966Z 2019-05-08 08:25:31 INFO  TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, 172.20.90.161, executor 1, partition 1, PROCESS_LOCAL, 7878 bytes)
-2019-05-08T08:25:31.957903365Z 2019-05-08 08:25:31 INFO  TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 727 ms on 172.20.90.161 (executor 1) (1/2)
-2019-05-08T08:25:31.99308236Z 2019-05-08 08:25:31 INFO  TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 47 ms on 172.20.90.161 (executor 1) (2/2)
-2019-05-08T08:25:31.994764897Z 2019-05-08 08:25:31 INFO  TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 
-2019-05-08T08:25:31.995390219Z 2019-05-08 08:25:31 INFO  DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.998 s
-2019-05-08T08:25:32.003622135Z 2019-05-08 08:25:32 INFO  DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 1.094511 s
-2019-05-08T08:25:32.005407995Z Pi is roughly 3.1436157180785904
-2019-05-08T08:25:32.011499948Z 2019-05-08 08:25:32 INFO  AbstractConnector:318 - Stopped Spark@7e97551f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
-2019-05-08T08:25:32.014105609Z 2019-05-08 08:25:32 INFO  SparkUI:54 - Stopped Spark web UI at http://demo1-1557303918993-driver-svc.default.svc:4040
-2019-05-08T08:25:32.01861939Z 2019-05-08 08:25:32 INFO  KubernetesClusterSchedulerBackend:54 - Shutting down all executors
-2019-05-08T08:25:32.019973046Z 2019-05-08 08:25:32 INFO  KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Asking each executor to shut down
-2019-05-08T08:25:32.025136562Z 2019-05-08 08:25:32 WARN  ExecutorPodsWatchSnapshotSource:87 - Kubernetes client has been closed (this is expected if the application is shutting down.)
-2019-05-08T08:25:32.087137746Z 2019-05-08 08:25:32 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
-2019-05-08T08:25:32.097659039Z 2019-05-08 08:25:32 INFO  MemoryStore:54 - MemoryStore cleared
-2019-05-08T08:25:32.098360561Z 2019-05-08 08:25:32 INFO  BlockManager:54 - BlockManager stopped
-2019-05-08T08:25:32.104432515Z 2019-05-08 08:25:32 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
-2019-05-08T08:25:32.10761075Z 2019-05-08 08:25:32 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
-2019-05-08T08:25:32.114734944Z 2019-05-08 08:25:32 INFO  SparkContext:54 - Successfully stopped SparkContext
-2019-05-08T08:25:32.117170277Z 2019-05-08 08:25:32 INFO  ShutdownHookManager:54 - Shutdown hook called
-2019-05-08T08:25:32.118273045Z 2019-05-08 08:25:32 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-bdb4e416-5ab7-420c-905e-ef43c30fb187
-2019-05-08T08:25:32.120019227Z 2019-05-08 08:25:32 INFO  ShutdownHookManager:54 - Deleting directory /var/data/spark-118b216d-2d39-4287-ad71-5b5d7c7195c9/spark-06dbab1f-13aa-474c-a1db-8845e14627bf
-```
-
-##### delete spark job 
-```$xslt
-arena delete --type=sparkjob demo 
-```
-You will found the spark job is deleted.
-```$xslt
-sparkapplication.sparkoperator.k8s.io "demo1" deleted
-time="2019-05-08T17:27:06+08:00" level=info msg="The Job demo1 has been deleted successfully"
-configmap "demo1-sparkjob" deleted
-```
-
-Congratulations! You've run the distributed spark job with `arena` successfully. 
--- a/archived/docs/userguide/12-volcanojob.md
+++ b/archived/docs/userguide/12-volcanojob.md
@ -1,156 +0,0 @@
-
-# Arena supports and simplifies volcano job.
-
-Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms currently missing from
-Kubernetes that are commonly required by many classes of batch & elastic workload including:
-
-1. machine learning/deep learning,
-2. bioinformatics/genomics, and
-3. other "big data" applications.
-
-## pre requisites
-
- k8s deployment
- deploy the volcano following the steps from kubernetes-artifacts/volcano-operator/README.md
-
-### 1. To run a batch/distributed volcano job, you may need to specify:
-
-```
--minAvailable int       The minimal available pods to run for this Job. default value is 1 (default 1)
--name string            override name
--queue string           Specifies the queue that will be used in the scheduler, default queue is used this leaves empty (default "default")
--schedulerName string   Specifies the scheduler Name, default is volcano when not specified (default "volcano")
--taskCPU string         cpu request for each task replica / pod. default value is 250m (default "250m")
--taskImages strings     the docker images of different tasks of volcano job. default used 3 tasks with ubuntu,nginx and busybox images (default [ubuntu,nginx,busybox])
--taskMemory string      memory request for each task replica/pod.default value is 128Mi) (default "128Mi")
--taskName string        the task name of volcano job, default value is task (default "task")
--taskPort int           the task port number. default value is 2222 (default 2222)
--taskReplicas int       the task replica's number to run the distributed tasks. default value is 1 (default 1)
-```
-
-### 2. More information related to volcano job.
-
-Arena volcano job is based on (https://github.com/volcano-sh/volcano).
-You can get more information related to volcano from https://volcano.sh/
-
-### 3. How to use Arena volcano job
-
-##### install volcano
- 
-deploy the volcano following the steps from kubernetes-artifacts/volcano-operator/README.md 
-
-To install the chart with the release name `volcano-release`
-
-```bash
-$ helm install --name volcano-release kubernetes-artifacts/volcano-operator
-```
-
-TO verify all deployments are running use the below command
-
-```bash
-    kubectl get deployment --all-namespaces | grep {release_name}
-```
-We should get similar output like given below, where three deployments for controller, admission, scheduler should be running.
-
-```bash
-NAME                       READY  UP-TO-DATE  AVAILABLE  AGE
-{release_name}-admission    1/1    1           1          4s
-{release_name}-controllers  1/1    1           1          4s
-{release_name}-scheduler    1/1    1           1          4s
-```
-
-TO verify all pods are running use the below command
-
-```bash
-    kubectl get pods --all-namespaces | grep {release_name}
-```
-
-We should get similar output like given below, where three pods for controller, admission,admissioninit, scheduler should be running.
-
-```bash
-NAMESPACE     NAME                                          READY    STATUS             RESTARTS   AGE
-default       volcano-release-admission-cbfdb8549-dz5hg      1/1     Running            0          33s
-default       volcano-release-admission-init-7xmzd           0/1     Completed          0          33s
-default       volcano-release-controllers-7967fffb8d-7vnn9   1/1     Running            0          33s
-default       volcano-release-scheduler-746f6557d8-9pfg6     1/1     Running            0          33s
-```
-
-##### submit a volcano job
-
-```$xslt
-arena submit volcanojob --name=demo
-```
-
-The result is like below.
-```$xslt
-
-configmap/demo-volcanojob created
-configmap/demo-volcanojob labeled
-job.batch.volcano.sh/demo created
-INFO[0003] The Job demo has been submitted successfully
-INFO[0003] You can run `arena get demo --type volcanojob` to check the job status
-
-```
-
-if we want to provide more command line parameters then
-```$xslt
-./bin/arena submit volcanojob --name demo12 --taskImages busybox,busybox  --taskReplicas 2
-```
-
-in above case it creates two tasks each with 2 replicas  as shown below
-```$xslt
-arena get --type volcanojob demo12
-```
-the result is as below
-```$xslt
-STATUS: SUCCEEDED
-NAMESPACE: default
-TRAINING DURATION: 2m
-
-NAME    STATUS     TRAINER     AGE  INSTANCE         NODE
-demo12  SUCCEEDED  VOLCANOJOB  2m   demo12-task-0-0  11.245.101.184
-demo12  SUCCEEDED  VOLCANOJOB  2m   demo12-task-0-1  11.245.101.184
-demo12  SUCCEEDED  VOLCANOJOB  2m   demo12-task-1-0  11.245.101.184
-demo12  SUCCEEDED  VOLCANOJOB  2m   demo12-task-1-1  11.245.101.184
-```
-##### get volcano job status
-
-```$xslt
-arena get --type=volcanojob demo
-```
-When the job running/succeed,you will see the result below.
-```$xslt
-STATUS: RUNNING/SUCCEEDED
-NAMESPACE: default
-TRAINING DURATION: 45s
-
-NAME  STATUS     TRAINER     AGE  INSTANCE       NODE
-demo  SUCCEEDED  VOLCANOJOB  59s  demo-task-0-0  11.245.101.184
-demo  RUNNING    VOLCANOJOB  59s  demo-task-1-0  11.245.101.184
-demo  SUCCEEDED  VOLCANOJOB  59s  demo-task-2-0  11.245.101.184
-
-```
-##### list arena jobs
-
-```$xslt
-arena list
-```
-we can observe the below data
-```$xslt
-NAME     STATUS   TRAINER     AGE  NODE
-demo     RUNNING  VOLCANOJOB  2m   11.245.101.184
-```
-
-##### delete volcano job
-
-```$xslt
-arena delete --type=volcanojob demo
-```
-You will found the volcano job is deleted.
-```$xslt
-job.batch.volcano.sh "demo" deleted
-configmap "demo-volcanojob" deleted
-INFO[0000] The Job demo has been deleted successfully
-```
-
-Congratulations! You've run the batch/distributed volcano job with `arena` successfully.
--- a/archived/docs/userguide/13-preempted-mpijob.md
+++ b/archived/docs/userguide/13-preempted-mpijob.md
@ -1,169 +0,0 @@
-
-# Arena supports Priority and Preemption for MPIJob
-
-## prerequisites
-
- k8s > 1.11
-
-1.Create `PriorityClass` with the yaml below:
-
-```yaml
-apiVersion: scheduling.k8s.io/v1beta1
-description: Used for the critical app
-kind: PriorityClass
-metadata:
-  name: critical
-value: 1100000
-
---
-
-apiVersion: scheduling.k8s.io/v1beta1
-description: Used for the medium app
-kind: PriorityClass
-metadata:
-  name: medium
-value: 1000000
-```
-
-Save the template that applies in a file named `pc.yaml`, and create the `PriorityClass`:
-
-```
-kubectl create -f pc.yaml
-```
-
-2.There is only 1 GPU available in the Kubernetes cluster
-
-```
-# arena top node
-NAME          IPADDRESS     ROLE    GPU(Total)  GPU(Allocated)
-192.168.0.20  192.168.0.20  master  0           0
-192.168.0.21  192.168.0.21  master  0           0
-192.168.0.22  192.168.0.22  master  0           0
-192.168.0.23  192.168.0.23  <none>  1           0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-0/1 (0%)
-```
-
-3.Run the MPI training Job with `medium` priority:
-
-
-The following command is an example. 
-
-```
-# arena submit mpi          \
-    --name=medium           \
-    --priority=medium       \
-    --gpus=1                \
-    --workers=1             \
-    --image=registry.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5               \
-    "mpirun tail -f /dev/null"
-configmap/medium-mpijob created
-configmap/medium-mpijob labeled
-mpijob.kubeflow.org/medium created
-INFO[0000] The Job medium has been submitted successfully
-INFO[0000] You can run `arena get medium --type mpijob` to check the job status
-```
-
-4.Get the details of the specific job
-
-```
-# arena get medium
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: medium
-TRAINING DURATION: 58s
-
-NAME    STATUS   TRAINER  AGE  INSTANCE               NODE
-medium  RUNNING  MPIJOB   58s  medium-launcher-sz5xj  192.168.0.23
-medium  RUNNING  MPIJOB   58s  medium-worker-0        192.168.0.23
-```
-
-5.The only one GPU is used by MPI training Job `medium`
-
-```
-# arena top node -d
-
-NAME:       cn-hangzhou.192.168.0.23
-IPADDRESS:  192.168.0.23
-ROLE:       <none>
-
-NAMESPACE  NAME             GPU REQUESTS  GPU LIMITS
-default    medium-worker-0  1             1
-
-Total GPUs In Node cn-hangzhou.192.168.0.23:      1
-Allocated GPUs In Node cn-hangzhou.192.168.0.23:  1 (100%)
-----------------------------------------------------------------------------------------
-
-Allocated/Total GPUs In Cluster:  1/1 (100%)
-```
-
-6.Run the MPI training Job with `critical` priority:
-
-```
-# arena submit mpi          \
-    --name=critical           \
-    --priority=critical       \
-    --gpus=1                \
-    --workers=1             \
-    --image=registry.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5               \
-    "mpirun tail -f /dev/null"
-```
-
-7.Check MPI Training Job `medium`, and find it's preempted by critical-worker-0
-
-```
-# kubectl get events --field-selector involvedObject.name=medium-worker-0
-LAST SEEN   TYPE     REASON      OBJECT                MESSAGE
-15m         Normal   Scheduled   pod/medium-worker-0   Successfully assigned default/medium-worker-0 to 192.168.0.23
-14m         Normal   Pulled      pod/medium-worker-0   Container image "registry.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5" already present on machine
-14m         Normal   Created     pod/medium-worker-0   Created container mpi
-14m         Normal   Started     pod/medium-worker-0   Started container mpi
-2m32s       Normal   Preempted   pod/medium-worker-0   by default/critical-worker-0 on node 192.168.0.23
-2m32s       Normal   Killing     pod/medium-worker-0   Stopping container mpi
-```
-
-8.Check the details of the MPI Training Job `medium`, and it's turned to fail
-
-```
-# arena get medium
-STATUS: FAILED
-NAMESPACE: default
-PRIORITY: medium
-TRAINING DURATION: 12m
-
-NAME    STATUS  TRAINER  AGE  INSTANCE               NODE
-medium  FAILED  MPIJOB   20m  medium-launcher-sz5xj  192.168.0.23
-```
-
-9.And check the details of the MPI Training Job `critical`, it's running.
-
-```
-# arena get critical
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: critical
-TRAINING DURATION: 10m
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-critical  RUNNING  MPIJOB   10m  critical-launcher-mfffs  192.168.0.23
-critical  RUNNING  MPIJOB   10m  critical-worker-0        192.168.0.23
-```
-
-10.And we can find the only GPU is used by the MPI Training Job `critical`
-
-```
-# arena top node -d
-NAME:       cn-hangzhou.192.168.0.23
-IPADDRESS:  192.168.0.23
-ROLE:       <none>
-
-NAMESPACE  NAME               GPU REQUESTS  GPU LIMITS
-default    critical-worker-0  1             1
-
-Total GPUs In Node cn-hangzhou.192.168.0.23:      1
-Allocated GPUs In Node cn-hangzhou.192.168.0.23:  1 (100%)
-----------------------------------------------------------------------------------------
-```
-
-Congratulations! You've run the the job in priorities and preemptions with `arena` successfully.
--- a/archived/docs/userguide/14-submit-with-node-selector.md
+++ b/archived/docs/userguide/14-submit-with-node-selector.md
@ -1,160 +0,0 @@
-
-
-Arena supports assigning jobs to some k8s particular nodes(Currently only support mpi job and tf job).
-
-some usage examples in here.
- 
-1.query k8s cluster information:
-```
-# kubectl get nodes
-NAME                       STATUS   ROLES    AGE     VERSION
-cn-beijing.192.168.3.225   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.226   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.227   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.228   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.229   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.230   Ready    <none>   2d22h   v1.12.6-aliyun.1
-```
-2.give a label to nodes,for example: give label "gpu_node=ok" to node "cn-beijing.192.168.3.228" and node "cn-beijing.192.168.3.229",give label "ssd_node=ok" to node "cn-beijing.192.168.3.230"
-```
-# kubectl label nodes cn-beijing.192.168.3.228 gpu_node=ok
-node/cn-beijing.192.168.3.228 labeled
-# kubectl label nodes cn-beijing.192.168.3.229 gpu_node=ok
-node/cn-beijing.192.168.3.229 labeled
-# kubectl label nodes cn-beijing.192.168.3.230 ssd_node=ok
-node/cn-beijing.192.168.3.230 labeled
-``` 
-## for MPI job
-1.when submit a job,you can assign nodes to run job with operation "--selector" 
-```
-# arena submit mpi --name=mpi-dist  \
-              --gpus=1              \
-              --workers=1              \
-	      --selector gpu_node=ok \
-              --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-              --tensorboard \
-              --loglevel debug \
-              "mpirun python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64     --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-```
-2.query the job information
-```
-# arena get mpi-dist                                                                                                                                  
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 21s
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-mpi-dist  RUNNING  MPIJOB   21s  mpi-dist-launcher-7jn4q  192.168.3.229
-mpi-dist  RUNNING  MPIJOB   21s  mpi-dist-worker-0        192.168.3.229
-
-Your tensorboard will be available on:
-http://192.168.3.225:31611
-```
-the jobs are running  on node cn-beijing.192.168.3.229(ip is 192.168.3.229).
-
-3.you can use "--selector" multiple times,for example you can use  "--selector gpu_node=ok --selector ssd_node=ok" in arena submit command,it represents that the job should be running on nodes which own label "gpu_node=ok" and label "ssd_node=ok".
-
-## for tf job 
-
-1.because there is four roles("PS","Worker","Evaluator","Chief") in tf job,you can use "--selector" to assgin nodes,this is effective for all roles.for example:
-```
-arena submit tfjob \
-      --name=tf \
-      --gpus=1              \
-      --workers=1              \
-      --selector ssd_node=ok \
-      --workerImage=cheyang/tf-mnist-distributed:gpu \
-      --psImage=cheyang/tf-mnist-distributed:cpu \
-      --ps=1              \
-      --tensorboard \
-      --loglevel debug \
-      "python /app/main.py"
-```
-use follow command to check the job status:
-
-```
-# arena get tf                                                                                                                                       
-STATUS: PENDING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 24s
-
-NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
-tf    RUNNING  TFJOB    24s  tf-ps-0      192.168.3.230
-tf    PENDING  TFJOB    24s  tf-worker-0  192.168.3.230
-
-Your tensorboard will be available on:
-http://192.168.3.225:31867
-```
-
-the jobs(include "PS" and "Worker") have been running on cn-beijing.192.168.3.230(ip is 192.168.3.230,label is "ssd_node=ok").
-
-2.you also can assign node to run single role job,for example: if you want to run a job whose role is "PS" on nodes which own label ssd_node="ok" and run "Worker" job on nodes which own label gpu_node=ok,you can use option "--ps-selector" and "--worker-selector" 
-```
-arena submit tfjob \
-      --name=tf \
-      --gpus=1              \
-      --workers=1              \
-      --ps-selector ssd_node=ok \
-      --worker-selector gpu_node=ok \
-      --workerImage=cheyang/tf-mnist-distributed:gpu \
-      --psImage=cheyang/tf-mnist-distributed:cpu \
-      --ps=1              \
-      --tensorboard \
-      --loglevel debug \
-      "python /app/main.py"
-```
-
-then check the jobs's status:
-
-```
-# arena get tf                                                                                                                                       
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 23s
-
-NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
-tf    RUNNING  TFJOB    23s  tf-ps-0      192.168.3.230
-tf    RUNNING  TFJOB    23s  tf-worker-0  192.168.3.228
-
-Your tensorboard will be available on:
-http://192.168.3.225:30162
-```
-
-the "PS" job is running on cn-beijing.192.168.3.230(ip is 192.168.3.230,label is "ssd_node=ok") and the "Worker" job is running on  cn-beijing.192.168.3.228(ip is 192.168.3.228,label is "gpu_node=ok") 
-
-3.if you use "--selector" in "arena submit tf" command and also use "--ps-selector"(or "--worker-selector","--evaluator-selector","chief-selector"),the value of "--ps-selector" would cover value of "--selector",for example:
-
-```
-arena submit tfjob \
-      --name=tf \
-      --gpus=1              \
-      --workers=1              \
-      --ps-selector ssd_node=ok \
-      --selector gpu_node=ok \
-      --workerImage=cheyang/tf-mnist-distributed:gpu \
-      --psImage=cheyang/tf-mnist-distributed:cpu \
-      --ps=1              \
-      --tensorboard \
-      --loglevel debug \
-      "python /app/main.py"
-```
-
-"PS" job will be running on nodes whose label is "ssd_node=ok",other jobs will be running on nodes whose label is "gpu_node=ok",now verify our conclusions,use follow command to check job status.
-```
-# arena get tf                                                                                                                                       
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 39s
-
-NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
-tf    RUNNING  TFJOB    39s  tf-ps-0      192.168.3.230
-tf    RUNNING  TFJOB    39s  tf-worker-0  192.168.3.228
-
-Your tensorboard will be available on:
-http://192.168.3.225:32105
-```
-as you can see, "PS" job is running on nodes which own label "ssd_node=ok",other jobs are running on nodes which own label "gpu_node=ok"
--- a/archived/docs/userguide/14-submit-with-node-toleration.md
+++ b/archived/docs/userguide/14-submit-with-node-toleration.md
@ -1,85 +0,0 @@
-
-
-Arena supports submiting a job with tolerating k8s nodes with taints(Currently only support mpi job and tf job).
-
-some usage examples in here.
- 
-1.query k8s cluster information:
-```
-# kubectl get nodes
-NAME                       STATUS   ROLES    AGE     VERSION
-cn-beijing.192.168.3.225   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.226   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.227   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.228   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.229   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.230   Ready    <none>   2d22h   v1.12.6-aliyun.1
-```
-2.give some taints for k8s nodes,for example: give taint "gpu_node=invalid:NoSchedule" to node "cn-beijing.192.168.3.228" and node "cn-beijing.192.168.3.229",give taint  "ssd_node=invalid:NoSchedule" to node "cn-beijing.192.168.3.230",now all k8s pod can't schedule to these nodes.
-```
-# kubectl taint nodes cn-beijing.192.168.3.228 gpu_node=invalid:NoSchedule                                                                            
-node/cn-beijing.192.168.3.228 tainted
-# kubectl taint nodes cn-beijing.192.168.3.229 gpu_node=invalid:NoSchedule                                                                            
-node/cn-beijing.192.168.3.229 tainted
-# kubectl taint nodes cn-beijing.192.168.3.230 ssd_node=invalid:NoSchedule                                                                            
-node/cn-beijing.192.168.3.230 tainted
-``` 
-3.when submit a job,you can tolerate some nodes with taints to run job with operation "--toleration" 
-```
-# arena submit mpi --name=mpi-dist  \
-              --gpus=1              \
-              --workers=1              \
-	      --toleration ssd_node \
-              --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-              --tensorboard \
-              --loglevel debug \
-              "mpirun python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64     --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-```
-query the job information
-```
-# arena get mpi-dist                                                                                                                                 
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 29s
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-launcher-jgms7  192.168.3.230
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-worker-0        192.168.3.230
-
-Your tensorboard will be available on:
-http://192.168.3.225:30052
-```
-the jobs are running  on node cn-beijing.192.168.3.230(ip is 192.168.3.230,taint is ssd_node=invalid).
-
-4.you can use "--toleration" multiple times,for example you can use  "--toleration gpu_node --toleration ssd_node" in arena submit command,it represents that the job tolerates nodes which own taint "gpu_node=invalid" and taint "ssd_node=invalid".
-
-```
-# arena submit mpi --name=mpi-dist  \
-              --gpus=1              \
-              --workers=1              \
-              --toleration ssd_node \
-              --toleration gpu_node \
-              --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-              --tensorboard \
-              --loglevel debug \
-              "mpirun python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64     --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-```
-query the job status:
-
-```
-# arena get mpi-dist
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 29s
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-launcher-jgms7  192.168.3.229
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-worker-0        192.168.3.230
-
-Your tensorboard will be available on:
-http://192.168.3.225:30052
-```
-
-5.you can use "--toleration all" to tolerate all node taints.
--- a/archived/docs/userguide/15-custom-serving-sample-beijing.jpg
+++ b/archived/docs/userguide/15-custom-serving-sample-beijing.jpg
--- a/archived/docs/userguide/15-custom-serving-sample-beijing_out.jpg
+++ b/archived/docs/userguide/15-custom-serving-sample-beijing_out.jpg
--- a/archived/docs/userguide/15-custom-serving-sample.md
+++ b/archived/docs/userguide/15-custom-serving-sample.md
@ -1,80 +0,0 @@
-# Serving Trained Model with arena
-
-You can use arena to deploy your trained model as RESTful APIs.to illustrate usage,we use a sample project [fast-style-transfer](https://github.com/floydhub/fast-style-transfer).in order to save time,we use its' trainted model and add the model to docker images.
-
-### 1.Serve Mode
-
-we use the app.py script in project to start restful server,you can use arena to deploy trainted model:
-
-```
-# arena serve custom \
-	--name=fast-style-transfer \
-	--gpus=1 \
-	--version=alpha \
-	--replicas=1 \
-	--restful-port=5000 \
-	--image=happy365/fast-style-transfer:latest \
-	"python app.py"
-``` 
-
-check the status of TensorFlow Serving Job:
-
-```
-# arena serve list
-NAME                 TYPE    VERSION  DESIRED  AVAILABLE  ENDPOINT_ADDRESS  PORTS
-fast-style-transfer  CUSTOM  alpha    1        0          172.21.8.94       grpc:8001,restful:5000
-```
-
-because the docker image is very large,pulling it requests some time,we can use kubectl to check the pod status:
-
-```
-# kubectl get po
-NAME                                                        READY   STATUS              RESTARTS   AGE
-fast-style-transfer-alpha-custom-serving-845ffbf7dd-btbhj   0/1     ContainerCreating   0          6m44s
-```
-
-### 2.Access the service  
-
-we can use a client to access the service,run the follow command to create a client:
-```
-# kubectl run  sample-client \
-	--generator=run-pod/v1 \
-	--image=happy365/arena-serve-custem-sample-client:latest \
-	--command -- \
-	/bin/sleep infinity
-```
-
-then,we can query the status of sample-client:
-```
-# kubectl get po  sample-client
-NAME            READY   STATUS    RESTARTS   AGE
-sample-client   1/1     Running   0          87s 
-
-```
-we should query the sevice name,it is a combination of job name and version(the sample job name is fast-style-transfer and version is alpha):
-
-```
-# kubectl get svc fast-style-transfer-alpha
-NAME                        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
-fast-style-transfer-alpha   ClusterIP   172.21.1.114   <none>        5000/TCP   31m
-```
-
-now,we can use the "kubectl exec" command to login the container:
-
-```
-# kubectl exec -ti sample-client /bin/sh
-#
-```
-
-then we use "curl" command to access the custom serving job:
-```
-# curl -o /root/output/beijing_out.jpg  -F "file=@/root/input/beijing.jpg" http://fast-style-transfer-alpha:5000
-```
-the input is an image which name is "beijing.jpg" ![beijing.jpg](15-custom-serving-sample-beijing.jpg),the image is stored in "/root/input",the output is  stored in "/root/output". you can use "kubectl cp" command to copy output image from container to host:
-```
-# kubectl cp sample-client:/root/output/beijing_out.jpg ~/beijing_out.jpg
-```
-now you can view the image in ~/beijing_out.jpg,there is "beijing_out.jpg" ![beijing_out.jpg](15-custom-serving-sample-beijing_out.jpg)
-
-
-
--- a/archived/docs/userguide/16-assign-config-file.md
+++ b/archived/docs/userguide/16-assign-config-file.md
@ -1,73 +0,0 @@
-# Assign configuration files for jobs
-
-you can pass the configuration files to containers when submiting jobs.
-
-this feature only support follow jobs:
-
-* tfjob
-* mpijob
-
-## 1.usage
-
-you can use `--config-file <host_path_file>:<container_path_file>` to assign a configuration file to container.and there is some rules:
-
-* if assignd <host_path_file> and not assign <container_path_file>,we see <container_path_file> is the same as <host_path_file>
-* <container_path_file> must be a file with absolute path
-*  you can use `--config-file` more than one in a command,eg: "--config-file /tmp/test1.conf:/etc/config/test1.conf --config-file /tmp/test2.conf:/etc/config/test2.conf"
-
-
-## 2.sample
-
-
-firstly,we create a test file which name is "test-config.json",its' path is "/tmp/test-config.json". we want push this file to containers of a tfjob (or mpijob) and the path in container is "/etc/config/config.json". 
-```
-# cat /tmp/test-config.json
-{
-    "key": "job-config"
-
-}
-```
-secondly,use follow command to create tfjob:
-```
-# arena submit tfjob \
-    --name=tf \
-    --gpus=1              \
-    --workers=1              \
-    --workerImage=cheyang/tf-mnist-distributed:gpu \
-    --psImage=cheyang/tf-mnist-distributed:cpu \
-    --ps=1              \
-    --tensorboard \
-    --config-file /tmp/test-config.json:/etc/config/config.json \
-      "python /app/main.py"
-```
-wait a minute,get the job status:
-```
-# arena get tf
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 16s
-
-NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
-tf    RUNNING  TFJOB    16s  tf-ps-0      192.168.7.18
-tf    RUNNING  TFJOB    16s  tf-worker-0  192.168.7.16
-
-Your tensorboard will be available on:
-http://192.168.7.10:31825
-```
-use kubectl to check file is in container or not:
-```
-# kubectl exec -ti tf-ps-0 -- cat /etc/config/config.json
-{
-    "key": "job-config"
-
-}
-# kubectl exec -ti tf-worker-0 -- cat /etc/config/config.json
-{
-    "key": "job-config"
-
-}
-
-```
-
-as you see,the file is in the containers.
--- a/archived/docs/userguide/17-pytorchjob-standalone.md
+++ b/archived/docs/userguide/17-pytorchjob-standalone.md
@ -1,95 +0,0 @@
-This example shows how to use `Arena` to submit a pytorch stand-alone job. This example will download the source code from git url.
-
-1. The first step is to check the available resources.
-    ```
-    ➜ arena top node
-    NAME                       IPADDRESS     ROLE    STATUS  GPU(Total)  GPU(Allocated)
-    cn-huhehaote.172.16.0.205  172.16.0.205  master  ready   0           0
-    cn-huhehaote.172.16.0.206  172.16.0.206  master  ready   0           0
-    cn-huhehaote.172.16.0.207  172.16.0.207  master  ready   0           0
-    cn-huhehaote.172.16.0.208  172.16.0.208  <none>  ready   4           0
-    cn-huhehaote.172.16.0.209  172.16.0.209  <none>  ready   4           0
-    cn-huhehaote.172.16.0.210  172.16.0.210  <none>  ready   4           0
-    -----------------------------------------------------------------------------------------
-    Allocated/Total GPUs In Cluster:
-    0/12 (0%)
-    ```
-    There are 3 available nodes with GPU for running training jobs.
-
-2. Submit a pytorch training job, this example download the source code from [Alibaba Cloud code](https://code.aliyun.com/370272561/mnist-pytorch.git).
-    ```
-    # Single gpu card 
-    ➜ arena --loglevel info submit pytorch \
-            --name=pytorch-local-git \
-            --gpus=1 \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            "python /root/code/mnist-pytorch/mnist.py --backend gloo"
-    configmap/pytorch-local-git-pytorchjob created
-    configmap/pytorch-local-git-pytorchjob labeled
-    pytorchjob.kubeflow.org/pytorch-local-git created
-    INFO[0000] The Job pytorch-local-git has been submitted successfully
-    INFO[0000] You can run `arena get pytorch-local-git --type pytorchjob` to check the job status
-    ```
-
-    > the source code will be downloaded and extracted to the directory `code/` of the working directory. The default working directory is `/root`, you can also specify by using `--workingDir`.
-    
-    > If you are using the private git repo, you can use the following command：
-
-    ```
-    ➜ arena --loglevel info submit pytorch \
-        --name=pytorch-local-git \
-        --gpus=1 \
-        --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-        --sync-mode=git \
-        --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-        --env=GIT_SYNC_USERNAME=yourname \
-        --env=GIT_SYNC_PASSWORD=yourpwd \
-        "python /root/code/mnist-pytorch/mnist.py --backend gloo"
-    ```
-
-3. List all the jobs.
-    ```
-    ➜ arena list
-    NAME                          STATUS     TRAINER     AGE  NODE
-    pytorch-local-git             SUCCEEDED  PYTORCHJOB  21h  N/A
-    ```
-
-4. Get the details of the this job.
-    ```
-    ➜ arena get pytorch-local-git
-    STATUS: SUCCEEDED
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 35s
-    
-    NAME               STATUS     TRAINER     AGE  INSTANCE                    NODE
-    pytorch-local-git  SUCCEEDED  PYTORCHJOB  23h  pytorch-local-git-master-0  172.16.0.210
-    ```
-
-5. Check logs.
-    ``` 
-    ➜ arena logs pytorch-local-git
-    WORLD_SIZE: 1, CURRENT_RANK: 0
-    args: Namespace(backend='gloo', batch_size=64, data='/root/code/mnist-pytorch', dir='/root/code/mnist-pytorch/logs', epochs=1, log_interval=10, lr=0.01, momentum=0.5, no_cuda=False, save_model=False, seed=1, test_batch_size=1000)
-    Using CUDA
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      np_resource = np.dtype([("resource", np.ubyte, 1)])
-    Train Epoch: 1 [0/60000 (0%)]	loss=2.3000
-    Train Epoch: 1 [640/60000 (1%)]	loss=2.2135
-    Train Epoch: 1 [1280/60000 (2%)]	loss=2.1705
-    Train Epoch: 1 [1920/60000 (3%)]	loss=2.0767
-    Train Epoch: 1 [2560/60000 (4%)]	loss=1.8681
-    ...
-    ```
--- a/archived/docs/userguide/18-pytorchjob-distributed.md
+++ b/archived/docs/userguide/18-pytorchjob-distributed.md
@ -1,131 +0,0 @@
-This example shows how to use `Arena` to submit a pytorch distributed job. This example will download the source code from git url.
-
-1. The first step is to check the available resources.
-    ```
-    ➜ arena top node
-    NAME                       IPADDRESS     ROLE    STATUS  GPU(Total)  GPU(Allocated)
-    cn-huhehaote.172.16.0.205  172.16.0.205  master  ready   0           0
-    cn-huhehaote.172.16.0.206  172.16.0.206  master  ready   0           0
-    cn-huhehaote.172.16.0.207  172.16.0.207  master  ready   0           0
-    cn-huhehaote.172.16.0.208  172.16.0.208  <none>  ready   4           0
-    cn-huhehaote.172.16.0.209  172.16.0.209  <none>  ready   4           0
-    cn-huhehaote.172.16.0.210  172.16.0.210  <none>  ready   4           0
-    -----------------------------------------------------------------------------------------
-    Allocated/Total GPUs In Cluster:
-    0/12 (0%)
-    ```
-    There are 3 available nodes with GPU for running training jobs.
-
-2. Submit a pytorch distributed training job with 2 nodes and one gpu card, this example download the source code from [Alibaba Cloud code](https://code.aliyun.com/370272561/mnist-pytorch.git).
-    ```
-    ➜ arena --loglevel info submit pytorch \
-            --name=pytorch-dist-git \
-            --gpus=1 \
-            --workers=2 \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            "python /root/code/mnist-pytorch/mnist.py --backend gloo"
-    configmap/pytorch-dist-git-pytorchjob created
-    configmap/pytorch-dist-git-pytorchjob labeled
-    pytorchjob.kubeflow.org/pytorch-dist-git created
-    INFO[0000] The Job pytorch-dist-git has been submitted successfully
-    INFO[0000] You can run `arena get pytorch-dist-git --type pytorchjob` to check the job status
-    ```
-
-    > the source code will be downloaded and extracted to the directory `code/` of the working directory. The default working directory is `/root`, you can also specify by using `--workingDir`.
-
-    >`workers` is the total number of nodes participating in the training (must be a positive integer and greater than or equal to 1), including rank0 node used to establish communication (corresponding to the `master` node in the pytorch-operator). The default value of the parameter is 1, which can not be set, as a stand-alone job.
-    
-
-3. List all the jobs.
-    ```
-    ➜ arena list
-    NAME                          STATUS     TRAINER     AGE  NODE
-    pytorch-dist-git              SUCCEEDED  PYTORCHJOB  23h  N/A
-    ```
-
-4. Get the details of the this job. There are 2 instances of this job, and instance `pytorch-dist-git-master-0` is the rank0. Arena simplifies the process of submitting distributed jobs with `PyTorch-Operator`.
-A `Service` will be created for this `master` instance for other nodes to access through the name of `Service` in `PyTorch-Operator`, and inject environment variables into each instance: `MASTER_PORT`、`MASTER_ADDR`、`WORLD_SIZE`、`RANK`. Initialization of distributed process group for pytorch（ dist.init_ process_ group). `MASTER_PORT` auto assign, `MASTER_ADDR` is "localhost" in the `master` instance, and other instances are `Service` name of the `master`,`WORLD_SIZE` is the total number of instances, and `RANK` is the serial number of the current calculation node, and `master` is 0, `Worker` instance is the index of instance name suffix plus one. For example, in the following example, `RANK` of instance `pytorch-dist-git-worker-0` is `0 + 1 = 1`
-In Arena, the value filled in by the parameter `--workers` contains one `master` instance, because `master` is also involved in training.
-    ```
-    ➜ arena get pytorch-local-git
-    STATUS: SUCCEEDED
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 1m
-    
-    NAME              STATUS     TRAINER     AGE  INSTANCE                   NODE
-    pytorch-dist-git  SUCCEEDED  PYTORCHJOB  23h  pytorch-dist-git-master-0  172.16.0.210
-    pytorch-dist-git  SUCCEEDED  PYTORCHJOB  23h  pytorch-dist-git-worker-0  172.16.0.210
-    ```
-
-5. Check logs.
-    ``` 
-    ➜ arena logs pytorch-dist-git
-    WORLD_SIZE: 2, CURRENT_RANK: 0
-    args: Namespace(backend='gloo', batch_size=64, data='/root/code/mnist-pytorch', dir='/root/code/mnist-pytorch/logs', epochs=1, log_interval=10, lr=0.01, momentum=0.5, no_cuda=False, save_model=False, seed=1, test_batch_size=1000)
-    Using CUDA
-    Using distributed PyTorch with gloo backend
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      np_resource = np.dtype([("resource", np.ubyte, 1)])
-    Train Epoch: 1 [0/60000 (0%)]	loss=2.3000
-    Train Epoch: 1 [640/60000 (1%)]	loss=2.2135
-    Train Epoch: 1 [1280/60000 (2%)]	loss=2.1705
-    Train Epoch: 1 [1920/60000 (3%)]	loss=2.0767
-    Train Epoch: 1 [2560/60000 (4%)]	loss=1.8681
-    Train Epoch: 1 [3200/60000 (5%)]	loss=1.4142
-    Train Epoch: 1 [3840/60000 (6%)]	loss=1.0009
-    ...
-    ```
-
-    > For multi instances of distributed job, the default output is the log of rank0 (the instance is the `master` node). If you want to view the log of the specific instance, you can view it by `-i` instance name, for example:
-    
-    ```
-    ➜ arena logs pytorch-dist-git -i pytorch-dist-git-worker-0
-    WORLD_SIZE: 2, CURRENT_RANK: 1
-    args: Namespace(backend='gloo', batch_size=64, data='/root/code/mnist-pytorch', dir='/root/code/mnist-pytorch/logs', epochs=1, log_interval=10, lr=0.01, momentum=0.5, no_cuda=False, save_model=False, seed=1, test_batch_size=1000)
-    Using CUDA
-    Using distributed PyTorch with gloo backend
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
-    /opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
-      np_resource = np.dtype([("resource", np.ubyte, 1)])
-    Train Epoch: 1 [0/60000 (0%)]	loss=2.3000
-    Train Epoch: 1 [640/60000 (1%)]	loss=2.2135
-    Train Epoch: 1 [1280/60000 (2%)]	loss=2.1705
-    Train Epoch: 1 [1920/60000 (3%)]	loss=2.0767
-    Train Epoch: 1 [2560/60000 (4%)]	loss=1.8681
-    Train Epoch: 1 [3200/60000 (5%)]	loss=1.4142
-    ```
-
-    > In addition, user can view the logs of the last few lines through the parameter `-t` lines num, such as:
-    
-    ```
-    ➜ arena logs pytorch-dist-git -i pytorch-dist-git-worker-0 -t 5
-    Train Epoch: 1 [58880/60000 (98%)]	loss=0.2048
-    Train Epoch: 1 [59520/60000 (99%)]	loss=0.0646
-    
-    accuracy=0.9661
-    
-    ```
-    > For more parameters, see ` arena logs -- help`
-  
--- a/archived/docs/userguide/19-pytorchjob-tensorboard.md
+++ b/archived/docs/userguide/19-pytorchjob-tensorboard.md
@ -1,75 +0,0 @@
-This example shows how to use `Arena` to submit a python distributed job and visualize by `Tensorboard`. The sample downloads the source code from git URL.
-
-1. The first step is to check the available resources.
-    ```
-    ➜ arena top node
-    NAME                       IPADDRESS     ROLE    STATUS  GPU(Total)  GPU(Allocated)
-    cn-huhehaote.172.16.0.205  172.16.0.205  master  ready   0           0
-    cn-huhehaote.172.16.0.206  172.16.0.206  master  ready   0           0
-    cn-huhehaote.172.16.0.207  172.16.0.207  master  ready   0           0
-    cn-huhehaote.172.16.0.208  172.16.0.208  <none>  ready   4           0
-    cn-huhehaote.172.16.0.209  172.16.0.209  <none>  ready   4           0
-    cn-huhehaote.172.16.0.210  172.16.0.210  <none>  ready   4           0
-    -----------------------------------------------------------------------------------------
-    Allocated/Total GPUs In Cluster:
-    0/12 (0%)
-    ```
-    There are 3 available nodes with GPU for running training jobs.
-
-2. Submit a pytorch distributed training job with 2 nodes and one gpu card, this example download the source code from [Alibaba Cloud code](https://code.aliyun.com/370272561/mnist-pytorch.git).
-    ```
-    ➜ arena --loglevel info submit pytorch \
-            --name=pytorch-dist-tensorboard \
-            --gpus=1 \
-            --workers=2 \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            --tensorboard \
-            --logdir=/root/logs \
-            "python /root/code/mnist-pytorch/mnist.py --epochs 50 --backend gloo --dir /root/logs"
-    configmap/pytorch-dist-tensorboard-pytorchjob created
-    configmap/pytorch-dist-tensorboard-pytorchjob labeled
-    service/pytorch-dist-tensorboard-tensorboard created
-    deployment.apps/pytorch-dist-tensorboard-tensorboard created
-    pytorchjob.kubeflow.org/pytorch-dist-tensorboard created
-    INFO[0000] The Job pytorch-dist-tensorboard has been submitted successfully
-    INFO[0000] You can run `arena get pytorch-dist-tensorboard --type pytorchjob` to check the job status
-    ```
-
-    > the source code will be downloaded and extracted to the directory `code/` of the working directory. The default working directory is `/root`, you can also specify by using `--workingDir`.
-
-    > `workers` is the total number of nodes participating in the training (must be a positive integer and greater than or equal to 1), including rank0 node used to establish communication (corresponding to the `master` node in the pytorch-operator). The default value of the parameter is 1, which can not be set, as a stand-alone job.
-
-    > `logdir` indicates where the tensorboard reads the event logs of Pytorch.
-
-3. List all the jobs.
-    ```
-    ➜ arena list
-    NAME                          STATUS     TRAINER     AGE  NODE
-    pytorch-dist-tensorboard      SUCCEEDED  PYTORCHJOB  22h  N/A
-    ```
-
-4. Get the details of the this job.
-    ```
-    ➜ arena get pytorch-dist-tensorboard
-    STATUS: SUCCEEDED
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 15m
-    
-    NAME                      STATUS     TRAINER     AGE  INSTANCE                           NODE
-    pytorch-dist-tensorboard  SUCCEEDED  PYTORCHJOB  22h  pytorch-dist-tensorboard-master-0  172.16.0.210
-    pytorch-dist-tensorboard  SUCCEEDED  PYTORCHJOB  22h  pytorch-dist-tensorboard-worker-0  172.16.0.210
-    
-    Your tensorboard will be available on:
-    http://172.16.0.205:30583
-    ```
-    > Notice: you can access the tensorboard by using `172.16.0.205:30583`. You can consider `sshuttle` if you can't access the tensorboard directly from your laptop. For example: 
-    ```
-    # you can install sshuttle==0.74 in your mac with python2.7
-    ➜ pip install sshuttle==0.74
-    # 0/0 -> 0.0.0.0/0
-    ➜ sshuttle -r root@39.104.17.205  0/0
-    ```
-    ![](19-pytorchjob-tensorboard.png)
--- a/archived/docs/userguide/19-pytorchjob-tensorboard.png
+++ b/archived/docs/userguide/19-pytorchjob-tensorboard.png
--- a/archived/docs/userguide/2-tensorboard.jpg
+++ b/archived/docs/userguide/2-tensorboard.jpg
--- a/archived/docs/userguide/2-tfjob-tensorboard.md
+++ b/archived/docs/userguide/2-tfjob-tensorboard.md
@ -1,109 +0,0 @@
-
-Here is an example how you can use `Arena` for the machine learning training. It will download the source code from git url, and use Tensorboard to visualize the Tensorflow computation graph and plot quantitative metrics.
-
-1. the first step is to check the available resources
-
-```
-arena top node
-NAME                                IPADDRESS      ROLE    GPU(Total)  GPU(Allocated)
-i-j6c68vrtpvj708d9x6j0  192.168.1.116  master  0           0
-i-j6c8ef8d9sqhsy950x7x  192.168.1.119  worker  1           0
-i-j6c8ef8d9sqhsy950x7y  192.168.1.120  worker  1           0
-i-j6c8ef8d9sqhsy950x7z  192.168.1.118  worker  1           0
-i-j6ccue91mx9n2qav7qsm  192.168.1.115  master  0           0
-i-j6ce09gzdig6cfcy1lwr  192.168.1.117  master  0           0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-0/3 (0%)
-```
-
-There are 3 available nodes with GPU for running training jobs.
-
-
-2\. Now we can submit a training job with `arena cli`, it will download the source code from github
-
-```
-# arena submit tf \
-             --name=tf-tensorboard \
-             --gpus=1 \
-             --image=tensorflow/tensorflow:1.5.0-devel-gpu \
-             --env=TEST_TMPDIR=code/tensorflow-sample-code/ \
-             --syncMode=git \
-             --syncSource=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-             --tensorboard \
-             --logdir=/training_logs \
-             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 5000"
-configmap/tf-tensorboard-tfjob created
-configmap/tf-tensorboard-tfjob labeled
-service/tf-tensorboard-tensorboard created
-deployment.extensions/tf-tensorboard-tensorboard created
-tfjob.kubeflow.org/tf-tensorboard created
-INFO[0001] The Job tf-tensorboard has been submitted successfully
-INFO[0001] You can run `arena get tf-tensorboard --type tfjob` to check the job status
-```
-
-> the source code will be downloaded and extracted to the directory `code/` of the working directory. The default working directory is `/root`, you can also specify by using `--workingDir`.
-
-> `logdir` indicates where the tensorboard reads the event logs of TensorFlow
-
-3\. List all the jobs
-
-```
-# arena list
-NAME            STATUS     TRAINER  AGE  NODE
-tf-tensorboard  RUNNING    TFJOB    0s   192.168.1.119
-```
-
-4\. Check the resource usage of the job
-
-```
-# arena top job
-NAME            STATUS     TRAINER  AGE  NODE           GPU(Requests)  GPU(Allocated)
-tf-tensorboard  RUNNING    TFJOB    26s  192.168.1.119  1              1
-
-
-Total Allocated GPUs of Training Job:
-0
-
-Total Requested GPUs of Training Job:
-1
-```
-
-
-
-5\. Check the resource usage of the cluster
-
-
-```
-# arena top node
-NAME                    IPADDRESS      ROLE    GPU(Total)  GPU(Allocated)
-i-j6c68vrtpvj708d9x6j0  192.168.1.116  master  0           0
-i-j6c8ef8d9sqhsy950x7x  192.168.1.119  worker  1           1
-i-j6c8ef8d9sqhsy950x7y  192.168.1.120  worker  1           0
-i-j6c8ef8d9sqhsy950x7z  192.168.1.118  worker  1           0
-i-j6ccue91mx9n2qav7qsm  192.168.1.115  master  0           0
-i-j6ce09gzdig6cfcy1lwr  192.168.1.117  master  0           0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-1/3 (33%)
-```
-
-
-6\. Get the details of the specific job
-
-```
-# arena get tf-tensorboard
-NAME            STATUS   TRAINER  AGE  INSTANCE                               NODE
-tf-tensorboard  RUNNING  tfjob    15s  tf-tensorboard-tfjob-586fcf4d6f-vtlxv  192.168.1.119
-tf-tensorboard  RUNNING  tfjob    15s  tf-tensorboard-tfjob-worker-0          192.168.1.119
-
-Your tensorboard will be available on:
-192.168.1.117:30670
-```
-
-> Notice: you can access the tensorboard by using `192.168.1.117:30670`. You can consider `sshuttle` if you can't access the tensorboard directly from your laptop. For example: `sshuttle -r root@47.89.59.51 192.168.0.0/16`
-
-
-![](2-tensorboard.jpg)
-
-Congratulations! You've run the training job with `arena` successfully, and you can also check the tensorboard easily.
--- a/archived/docs/userguide/20-pytorchjob-distributed-data.md
+++ b/archived/docs/userguide/20-pytorchjob-distributed-data.md
@ -1,123 +0,0 @@
-This example shows how to use `Arena` to submit a python distributed job and mount an NFS data volume. The sample downloads the source code from git URL.
-
-1. Set up an NFS server.(refer to: https://www.cnblogs.com/weifeng1463/p/10037803.html )
-	```shell
-	# install nfs server
-	➜ yum install nfs-utils -y
-    # Create local directory of NFS server
-	➜ mkdir -p /root/nfs/data
-	# Configure nfs server
-	➜ cat /etc/exports
-	/root/nfs/data *(rw,no_root_squash)
-	# Start nfs server
-	➜ systemctl start nfs;  systemctl start rpcbind
-	➜ systemctl enable nfs
-	Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
-	```
-2. Download training data to shared directory of NFS.
-	```shell
-	# Get information of NFS server by showmount, 172.16.0.200 is the host ip of NFS server
-	➜ showmount -e 172.16.0.200
-	Export list for 172.16.0.200:
-	/root/nfs/data *
-	# Enter shared directory
-	➜ cd /root/nfs/data
-	# Prepare training data to shared directory
-	➜ pwd
-	/root/nfs/data
-	# MNIST -> That's the training data we need
-	➜ ll
-	total 8.0K
-	drwxr-xr-x 4  502 games 4.0K 6月  17 16:05 data
-	drwxr-xr-x 4 root root  4.0K 6月  23 15:17 MNIST
-	```
-3. Create PV.
-	```shell
-	# Note: Typesetting may cause yaml indentation problems
-	➜ cat nfs-pv.yaml 
-	apiVersion: v1
-	kind: PersistentVolume
-	metadata:
-	  name: pytorchdata
-	  labels:
-		pytorchdata: nas-mnist
-	spec:
-	  persistentVolumeReclaimPolicy: Retain
-	  capacity:
-		storage: 10Gi
-	  accessModes:
-	  - ReadWriteMany
-	  nfs:
-		server: 172.16.0.200
-		path: "/root/nfs/data"
-	
-	➜ kubectl create -f nfs-pv.yaml
-	persistentvolume/pytorchdata created
-	➜ kubectl get pv | grep pytorchdata
-	pytorchdata   10Gi       RWX            Retain           Bound    default/pytorchdata                           7m38s
-	```
-5. Create PVC.
-	```shell
-	➜ cat nfs-pvc.yaml
-	apiVersion: v1
-	kind: PersistentVolumeClaim
-	metadata:
-	  name: pytorchdata
-	  annotations:
-		description: "this is the mnist demo"
-		owner: Tom
-	spec:
-	  accessModes:
-		- ReadWriteMany
-	  resources:
-		requests:
-		   storage: 5Gi
-	  selector:
-		matchLabels:
-		  pytorchdata: nas-mnist
-		  
-	➜ kubectl create -f nfs-pvc.yaml
-	persistentvolumeclaim/pytorchdata created
-	➜ kubectl get pvc | grep pytorchdata
-	pytorchdata   Bound    pytorchdata   10Gi       RWX                           2m3s
-	```
-7. Check the data volume.
-	```shell
-	➜ arena data list
-	NAME         ACCESSMODE     DESCRIPTION             OWNER  AGE
-	pytorchdata  ReadWriteMany  this is the mnist demo  Tom    2m
-	```
-9. Submit the pytorch job through `--data pvc_name:container_path` mount distributed storage volume.
-	```shell
-	➜ arena --loglevel info submit pytorch \
-            --name=pytorch-data \
-            --gpus=1 \
-            --workers=2 \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            --data=pytorchdata:/mnist_data \
-            "python /root/code/mnist-pytorch/mnist.py --backend gloo --data /mnist_data/data"
-	configmap/pytorch-data-pytorchjob created
-	configmap/pytorch-data-pytorchjob labeled
-	pytorchjob.kubeflow.org/pytorch-data created
-	INFO[0000] The Job pytorch-data has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-data --type pytorchjob` to check the job status
-	```
-11. Get status of volume `pytorchdata` in one of the instances by `kubectl describe`.
-	```shell
-	# Get the details of the this job
-	➜ arena get pytorch-data
-	STATUS: SUCCEEDED
-	NAMESPACE: default
-	PRIORITY: N/A
-	TRAINING DURATION: 56s
-
-	NAME          STATUS     TRAINER     AGE  INSTANCE               NODE
-	pytorch-data  SUCCEEDED  PYTORCHJOB  1m   pytorch-data-master-0  172.16.0.210
-	pytorch-data  SUCCEEDED  PYTORCHJOB  1m   pytorch-data-worker-0  172.16.0.210
-	
-    # Get status of volume `pytorchdata` from `pytorch-data-master-0`
-	➜ kubectl describe pod pytorch-data-master-0 | grep pytorchdata -C 3
-	```
-	![](20-pytorchjob-distributed-data.png) 
--- a/archived/docs/userguide/20-pytorchjob-distributed-data.png
+++ b/archived/docs/userguide/20-pytorchjob-distributed-data.png
--- a/archived/docs/userguide/21-pytorchjob-with-node-selector.md
+++ b/archived/docs/userguide/21-pytorchjob-with-node-selector.md
@ -1,54 +0,0 @@
-## Arena supports assigning  pytorch jobs to some k8s particular nodes
-
-1. Get k8s cluster information:
-	```shell
-	➜ kubectl get nodes
-	NAME                        STATUS   ROLES    AGE     VERSION
-	cn-huhehaote.172.16.0.205   Ready    master   4h19m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.206   Ready    master   4h18m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.207   Ready    master   4h17m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.208   Ready    <none>   4h13m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.209   Ready    <none>   4h13m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.210   Ready    <none>   4h13m   v1.16.9-aliyun.1
-	```
-2. Give a label to nodes,for example:
-	```shell
-	# 172.16.0.208 label gpu_node=ok
-	➜ kubectl label nodes cn-huhehaote.172.16.0.208 gpu_node=ok
-	node/cn-huhehaote.172.16.0.208 labeled
-	# 172.16.0.209 label gpu_node=ok
-	➜ kubectl label nodes cn-huhehaote.172.16.0.209 gpu_node=ok
-	node/cn-huhehaote.172.16.0.209 labeled
-	# 172.16.0.210 label ssd_node=ok
-	➜ kubectl label nodes cn-huhehaote.172.16.0.210 ssd_node=ok
-	node/cn-huhehaote.172.16.0.210 labeled
-	```
-3. When submitting a python job, you can use the `--selector` to decide which node the job runs on
-	```shell
-	➜ arena --loglevel info submit pytorch \
-            --name=pytorch-selector \
-            --gpus=1 \
-            --workers=2 \
-            --selector gpu_node=ok \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            "python /root/code/mnist-pytorch/mnist.py --backend gloo"
-	configmap/pytorch-selector-pytorchjob created
-	configmap/pytorch-selector-pytorchjob labeled
-	pytorchjob.kubeflow.org/pytorch-selector created
-	INFO[0000] The Job pytorch-selector has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-selector --type pytorchjob` to check the job status
-	```
-4. Get the job details, you can see that the job only runs on this node with IP 172.16.0.209 and label `gpu_node=ok`.
-	```shell
-	➜ arena get pytorch-selector
-	STATUS: PENDING
-	NAMESPACE: default
-	PRIORITY: N/A
-	TRAINING DURATION: 14s
-
-	NAME              STATUS   TRAINER     AGE  INSTANCE                   NODE
-	pytorch-selector  PENDING  PYTORCHJOB  14s  pytorch-selector-master-0  172.16.0.209
-	pytorch-selector  PENDING  PYTORCHJOB  14s  pytorch-selector-worker-0  172.16.0.209
-	```
--- a/archived/docs/userguide/22-pytorchjob-with-node-toleration.md
+++ b/archived/docs/userguide/22-pytorchjob-with-node-toleration.md
@ -1,96 +0,0 @@
-## Arena supports submiting a pytorch job with tolerating k8s nodes with taints
-
-1. Get k8s cluster information:
-	```shell
-	➜ kubectl get node
-	NAME                        STATUS   ROLES    AGE     VERSION
-	cn-huhehaote.172.16.0.205   Ready    master   5h13m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.206   Ready    master   5h12m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.207   Ready    master   5h11m   v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.208   Ready    <none>   5h7m    v1.16.9-aliyun.1
-	cn-huhehaote.172.16.0.209   Ready    <none>   5h7m    v1.16.9-aliyun.1
-	cn-huhehasote.172.16.0.210   Ready    <none>   5h7m    v1.16.9-aliyun.1
-	```
-2. Give some taints for k8s nodes,for example:
-	```shell
-	# taint --> gpu_node
-	➜  kubectl taint nodes cn-huhehaote.172.16.0.208 gpu_node=invalid:NoSchedule
-	node/cn-huhehaote.172.16.0.208 tainted
-	➜  kubectl taint nodes cn-huhehaote.172.16.0.209 gpu_node=invalid:NoSchedule
-	node/cn-huhehaote.172.16.0.209 tainted
-	# taint --> ssd_node
-	➜  kubectl taint nodes cn-huhehaote.172.16.0.210 ssd_node=invalid:NoSchedule
-	node/cn-huhehaote.172.16.0.210 tainted
-	```
-3. When we add the wrong nodes' taints or restore the node's schedulability, we can remove the nodes' taints in the following commands:
-	```shell
-	➜ kubectl taint nodes cn-huhehaote.172.16.0.208 gpu_node-
-	node/cn-huhehaote.172.16.0.208 untainted
-	➜ kubectl taint nodes cn-huhehaote.172.16.0.209 gpu_node-
-	node/cn-huhehaote.172.16.0.209 untainted
-	➜ kubectl taint nodes cn-huhehaote.172.16.0.210 ssd_node-
-	node/cn-huhehaote.172.16.0.210 untainted
-	```
-4. When submit a job, you can tolerate some nodes with taints to run job with operation `--toleration`, for example `--toleration=gpu_node`. This parameter can be used multiple times with different taint keys.
-	```shell
-	➜ arena --loglevel info submit pytorch \
-            --name=pytorch-toleration \
-            --gpus=1 \
-            --workers=2 \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            --tensorboard \
-            --logdir=/root/logs \
-            --toleration gpu_node \
-            "python /root/code/mnist-pytorch/mnist.py --epochs 50 --backend gloo --dir /root/logs"
-	configmap/pytorch-toleration-pytorchjob created
-	configmap/pytorch-toleration-pytorchjob labeled
-	service/pytorch-toleration-tensorboard created
-	deployment.apps/pytorch-toleration-tensorboard created
-	pytorchjob.kubeflow.org/pytorch-toleration created
-	INFO[0000] The Job pytorch-toleration has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-toleration --type pytorchjob` to check the job status
-	```
-5. Get the details of the this job.
-	```shell
-	arena get pytorch-toleration
-	STATUS: RUNNING
-	NAMESPACE: default
-	PRIORITY: N/A
-	TRAINING DURATION: 2m
-
-	NAME                STATUS   TRAINER     AGE  INSTANCE                     NODE
-	pytorch-toleration  RUNNING  PYTORCHJOB  2m   pytorch-toleration-master-0  172.16.0.209
-	pytorch-toleration  RUNNING  PYTORCHJOB  2m   pytorch-toleration-worker-0  172.16.0.209
-
-	Your tensorboard will be available on:
-	http://172.16.0.205:32091
-	```
-6. You can use `--toleration all` to tolerate all node taints.
-	```shell
-	➜ arena --loglevel info submit pytorch \
-            --name=pytorch-toleration-all \
-            --gpus=1 \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            --toleration all \
-            "python /root/code/mnist-pytorch/mnist.py --epochs 10 --backend gloo"
-	configmap/pytorch-toleration-all-pytorchjob created
-	configmap/pytorch-toleration-all-pytorchjob labeled
-	pytorchjob.kubeflow.org/pytorch-toleration-all created
-	INFO[0000] The Job pytorch-toleration-all has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-toleration-all --type pytorchjob` to check the job status
-	```
-7. Get the details of the this job.
-	```shell
-	➜ arena get pytorch-toleration-all
-	STATUS: RUNNING
-	NAMESPACE: default
-	PRIORITY: N/A
-	TRAINING DURATION: 33s
-
-	NAME                    STATUS   TRAINER     AGE  INSTANCE                         NODE
-	pytorch-toleration-all  RUNNING  PYTORCHJOB  33s  pytorch-toleration-all-master-0  172.16.0.210
-	```
--- a/archived/docs/userguide/23-pytorchjob-assign-config-file.md
+++ b/archived/docs/userguide/23-pytorchjob-assign-config-file.md
@ -1,49 +0,0 @@
-## Assign configuration files for pytorch jobs
-
-You can pass the configuration files to containers when submiting jobs.
-
-1. Prepare the configuration file to be mounted on the submitted machine.
-	```shell
-	# prepare your config-file
-	➜ cat  /tmp/test-config.json
-	{
-		"key": "job-config"
-	}
-	```
-2. Submit the job, and specify the configuration file to mount by `--config-file`.
-	```shell
-	# arena submit job by --config-file  ${host-config-file}:${container-config-file}
-	# This parameter supports multiple use and mounting multiple configuration files
-	➜ arena --loglevel info submit pytorch \
-            --name=pytorch-config-file \
-            --gpus=1 \
-            --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-            --sync-mode=git \
-            --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-            --config-file /tmp/test-config.json:/etc/config/config.json \
-            "python /root/code/mnist-pytorch/mnist.py --epochs 50 --backend gloo"
-	configmap/pytorch-config-file-pytorchjob created
-	configmap/pytorch-config-file-pytorchjob labeled
-	configmap/pytorch-config-file-a9cbad1b8719778 created
-	pytorchjob.kubeflow.org/pytorch-config-file created
-	INFO[0000] The Job pytorch-config-file has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-config-file --type pytorchjob` to check the job status
-	```
-3. Get the details of the this job.
-	```shell
-	➜ arena get pytorch-config-file --type pytorchjob
-	STATUS: RUNNING
-	NAMESPACE: default
-	PRIORITY: N/A
-	TRAINING DURATION: 51s
-
-	NAME                 STATUS   TRAINER     AGE  INSTANCE                      NODE
-	pytorch-config-file  RUNNING  PYTORCHJOB  51s  pytorch-config-file-master-0  172.16.0.210
-	```
-4. Use kubectl to check file is in container or not:
-    ```
-    ➜ kubectl exec -ti pytorch-config-file-master-0 -- cat /etc/config/config.json
-    {
-        "key": "job-config"
-    }
-    ```
--- a/archived/docs/userguide/24-pytorchjob-preempted.md
+++ b/archived/docs/userguide/24-pytorchjob-preempted.md
@ -1,130 +0,0 @@
-## Arena supports Priority and Preemption for pytorch job
-
-1. Create `PriorityClass` with the yaml below.There are two priorities defined here: `critical` and `medium`.
-	```shell
-	# critical 和 medium 声明
-	➜ cat priorityClass.yaml
-	apiVersion: scheduling.k8s.io/v1beta1
-	description: Used for the critical app
-	kind: PriorityClass
-	metadata:
-	  name: critical
-	value: 1100000
-
-	---
-
-	apiVersion: scheduling.k8s.io/v1beta1
-	description: Used for the medium app
-	kind: PriorityClass
-	metadata:
-	  name: medium
-	value: 1000000
-
-	# Create two priority objects: critical and medium
-	➜ kubectl create -f priorityClass.yaml
-	priorityclass.scheduling.k8s.io/critical created
-	priorityclass.scheduling.k8s.io/medium created
-	```
-2. Check the available resources.There are 3 nodes in total, and each node has 4 gpu cards.
-	```shell
-	➜ arena top node
-	NAME                       IPADDRESS     ROLE    STATUS  GPU(Total)  GPU(Allocated)
-	cn-huhehaote.172.16.0.205  172.16.0.205  master  ready   0           0
-	cn-huhehaote.172.16.0.206  172.16.0.206  master  ready   0           0
-	cn-huhehaote.172.16.0.207  172.16.0.207  master  ready   0           0
-	cn-huhehaote.172.16.0.208  172.16.0.208  <none>  ready   4           0
-	cn-huhehaote.172.16.0.209  172.16.0.209  <none>  ready   4           0
-	cn-huhehaote.172.16.0.210  172.16.0.210  <none>  ready   4           0
-	-----------------------------------------------------------------------------------------
-	Allocated/Total GPUs In Cluster:
-	0/12 (0%)
-	```
-3. Submit a GPU job with `medium` priority of 3 nodes and 4 cards, which occupies the full resources. In order to verify the effect, we can increase the epoch of training, extend the training time, and facilitate the experiment to view.
-	```shell
-	➜ arena --loglevel info submit pytorch \
-		--name=pytorch-priority-medium \
-		--gpus=4 \
-		--workers=3 \
-		--image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-		--sync-mode=git \
-		--sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-		--priority=medium \
-		"python /root/code/mnist-pytorch/mnist.py --backend gloo --epochs 200"
-	configmap/pytorch-priority-medium-pytorchjob created
-	configmap/pytorch-priority-medium-pytorchjob labeled
-	pytorchjob.kubeflow.org/pytorch-priority-medium created
-	INFO[0000] The Job pytorch-priority-medium has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-priority-medium --type pytorchjob` to check the job status
-	```
-4. Get the details of the this job. You can see that the task is running.
-	```shell
-	➜ arena get pytorch-priority-medium
-	STATUS: RUNNING
-	NAMESPACE: default
-	PRIORITY: medium
-	TRAINING DURATION: 3m
-
-	NAME                     STATUS   TRAINER     AGE  INSTANCE                          NODE
-	pytorch-priority-medium  RUNNING  PYTORCHJOB  3m   pytorch-priority-medium-master-0  172.16.0.208
-	pytorch-priority-medium  RUNNING  PYTORCHJOB  3m   pytorch-priority-medium-worker-0  172.16.0.210
-	pytorch-priority-medium  RUNNING  PYTORCHJOB  3m   pytorch-priority-medium-worker-1  172.16.0.209
-	```
-5. Check the GPU card usage. It is all occupied.
-	```shell
-	➜ arena top node
-	NAME                       IPADDRESS     ROLE    STATUS  GPU(Total)  GPU(Allocated)
-	cn-huhehaote.172.16.0.205  172.16.0.205  master  ready   0           0
-	cn-huhehaote.172.16.0.206  172.16.0.206  master  ready   0           0
-	cn-huhehaote.172.16.0.207  172.16.0.207  master  ready   0           0
-	cn-huhehaote.172.16.0.208  172.16.0.208  <none>  ready   4           4
-	cn-huhehaote.172.16.0.209  172.16.0.209  <none>  ready   4           4
-	cn-huhehaote.172.16.0.210  172.16.0.210  <none>  ready   4           4
-	-----------------------------------------------------------------------------------------
-	Allocated/Total GPUs In Cluster:
-	12/12 (100%)
-	```
-6. Submit a job with priority of `critical` to initiate preemption.
-	```shell
-	➜ arena --loglevel info submit pytorch \
-		--name=pytorch-priority-critical \
-		--gpus=1 \
-		--image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-		--sync-mode=git \
-		--sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-		--priority=critical \
-		"python /root/code/mnist-pytorch/mnist.py --backend gloo --epochs 50"
-	configmap/pytorch-priority-critical-pytorchjob created
-	configmap/pytorch-priority-critical-pytorchjob labeled
-	pytorchjob.kubeflow.org/pytorch-priority-critical created
-	INFO[0000] The Job pytorch-priority-critical has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-priority-critical --type pytorchjob` to check the job status
-	```
-7. Get the details of the this job.
-	```shell
-	➜  arena get pytorch-priority-critical
-	arena get pytorch-priority-critical
-	STATUS: RUNNING
-	NAMESPACE: default
-	PRIORITY: critical
-	TRAINING DURATION: 22s
-
-	NAME                       STATUS   TRAINER     AGE  INSTANCE                            NODE
-	pytorch-priority-critical  RUNNING  PYTORCHJOB  22s  pytorch-priority-critical-master-0  172.16.0.208
-	```
-8. Check the job status of `medium` priority. It has become `FAILED`. One instance has been deleted due to preemption.
-	```shell
-	➜  arena get pytorch-priority-medium
-	STATUS: FAILED
-	NAMESPACE: default
-	PRIORITY: medium
-	TRAINING DURATION: 1m
-
-	NAME                     STATUS  TRAINER     AGE  INSTANCE                          NODE
-	pytorch-priority-medium  FAILED  PYTORCHJOB  2m   pytorch-priority-medium-master-0  172.16.0.210
-	pytorch-priority-medium  FAILED  PYTORCHJOB  2m   pytorch-priority-medium-worker-0  172.16.0.209
-	```
-9. Check the event of the `pytorch-priority-medium`, and you can see that its `python-priority-media-worker-1` has been expelled. The reason for the expulsion is that the `python-priority-critical-master-0` is also applying for the resource of this node, and the node has no additional GPU resource, so the low priority job is preempted by the high priority job.
-	```shell
-	➜ kubectl get events --field-selector involvedObject.name=pytorch-priority-medium-worker-1
-	```
-	![](24-pytorchjob-preempted.png) 
--- a/archived/docs/userguide/24-pytorchjob-preempted.png
+++ b/archived/docs/userguide/24-pytorchjob-preempted.png
--- a/archived/docs/userguide/25-pytorchjob-clean-pod-policy.md
+++ b/archived/docs/userguide/25-pytorchjob-clean-pod-policy.md
@ -1,40 +0,0 @@
-## Specify the clean-up policy of pod after finishing for pytorch job
-
-1. Submit a job, and specify `--clean-task-policy` as `All`. After the job finished (`SUCCEEDED` or `FAILED`), all instances (pods) will be deleted; the default is `None`, and all pods will be retained.
-	```shell
-	➜ arena --loglevel info submit pytorch \
-		--name=pytorch-clean-policy \
-		--gpus=1 \
-		--image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
-		--sync-mode=git \
-		--sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-		--clean-task-policy=All \
-		"python /root/code/mnist-pytorch/mnist.py --backend gloo"
-	configmap/pytorch-clean-policy-pytorchjob created
-	configmap/pytorch-clean-policy-pytorchjob labeled
-	pytorchjob.kubeflow.org/pytorch-clean-policy created
-	INFO[0000] The Job pytorch-clean-policy has been submitted successfully
-	INFO[0000] You can run `arena get pytorch-clean-policy --type pytorchjob` to check the job status
-	```
-
-2. Get the job details. After the job is finished, the instance `python-clean-policy-master-0` has been deleted.
-	```shell
-    # RUNNING
-    ➜ arena get pytorch-clean-policy
-    STATUS: RUNNING
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 18s
-    
-    NAME                  STATUS   TRAINER     AGE  INSTANCE                       NODE
-    pytorch-clean-policy  RUNNING  PYTORCHJOB  18s  pytorch-clean-policy-master-0  172.16.0.209
-    
-    # FINISHED
-    ➜ arena get pytorch-clean-policy
-    STATUS: SUCCEEDED
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 37s
- 
-    NAME  STATUS  TRAINER  AGE  INSTANCE  NODE
-	```
--- a/archived/docs/userguide/26-submitjob-with-private-registry.md
+++ b/archived/docs/userguide/26-submitjob-with-private-registry.md
@ -1,168 +0,0 @@
-# Submit the training jobs with ImagePullSecrets
-
-You can use a private registry when submiting jobs(include tensorboard images).
-Assume the following images are in your private registry.
-```shell
-# pytorch
-registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard-secret:1.5.1-cuda10.1-cudnn7-runtime
-# tf
-registry.cn-huhehaote.aliyuncs.com/lumo/tensorflow:1.5.0-devel-gpu
-# mpi
-registry.cn-huhehaote.aliyuncs.com/lumo/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5
-# tensorboard (--tensorboard-image)
-registry.cn-huhehaote.aliyuncs.com/lumo/tensorflow:1.12.0-devel
-```
-
-## Contents
-* <a href="#create_secret">Create ImagePullSecrets</a>
-* <a href="#tfjob">TFJob With Secret</a>
-* <a href="#mpijob">MPIJob With Secret</a>
-* <a href="#pytorchjob">PyTorchJob With Secret</a>
-* <a href="#arenaConfig">Load imagePullSecrets from configuration of Arena<a>
-
-
-## <a name="create_secret">Create ImagePullSecrets</a>
-* Create a [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) with kubectl. In this case, it's [imagePullSecrets](https://kubernetes.io/docs/concepts/containers/images/). 
-    ```shell script
-    kubectl create secret docker-registry [$Reg_Secret] --docker-server=[$Registry] --docker-username=[$Username] --docker-password=[$Password] --docker-email=[$Email]
-    ```
-    > Note：
-    > [$Reg_Secret] is the name of the secret key, which can be defined by yourself.
-    > [$Registry] is your private registry address.
-    > [$Username] is username of your private registry.
-    > [$Password] is password of your private registry.
-    > [$Email] is your email address, Optional.
-
-    For Example:
-    ```shell
-    kubectl create secret docker-registry \
-    lumo-secret \
-    --docker-server=registry.cn-huhehaote.aliyuncs.com \
-    --docker-username=******@test.aliyunid.com \
-    --docker-password=******
-    secret/lumo-secret created
-    ```
-    You can check that the secret was created.
-    ```shell
-    # kubectl get secrets | grep lumo-secret
-    lumo-secret                                       kubernetes.io/dockerconfigjson        1      52s
-    ```
-  
-## <a name="tfjob">TFJob With Secret</a> 
-Submit the job by using `--image-pull-secrets` to specify the imagePullSecrets.
-1. Submit tf job.
-    ```shell
-    arena submit tf \
-              --name=tf-git-with-secret \
-              --working-dir=/root \
-              --gpus=1 \
-              --image=registry.cn-huhehaote.aliyuncs.com/lumo/tensorflow:1.5.0-devel-gpu \
-              --sync-mode=git \
-              --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-              --data=training-data:/mnist_data \
-              --tensorboard \
-              --tensorboard-image=registry.cn-huhehaote.aliyuncs.com/lumo/tensorflow:1.12.0-devel \
-              --logdir=/mnist_data/tf_data/logs \
-              --image-pull-secrets=lumo-secret \
-              "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --log_dir /mnist_data/tf_data/logs  --data_dir /mnist_data/tf_data/"
-    ```
-   > Note:
-   > If you have many `imagePullSecrets` to use, you can use `--image-pull-secrets` multiple times.
-   ```shell
-   arena submit tf \
-         --name=tf-git-with-secret \
-         ... \
-         --image-pull-secrets=lumo-secret \
-         --image-pull-secrets=king-secret \
-         --image-pull-secrets=test-secret
-         ...   
-   ```
-2. Get the details of the job.
-   ```shell 
-   # arena get tf-git-with-secret
-   STATUS: RUNNING
-   NAMESPACE: default
-   PRIORITY: N/A
-   TRAINING DURATION: 17s
-   
-   NAME                STATUS   TRAINER  AGE  INSTANCE                    NODE
-   tf-git-with-secret  RUNNING  TFJOB    17s  tf-git-with-secret-chief-0  172.16.0.202
-   
-   Your tensorboard will be available on:
-   http://172.16.0.198:30080
-   ```
-  
-## <a name="mpijob">MPIJob With Secret</a>
-Submit the job by using `--image-pull-secrets` to specify the imagePullSecrets.         
-1. Submit mpi job.
-   ```shell 
-   arena submit mpi \
-          --name=mpi-dist-with-secret \
-          --gpus=1 \
-          --workers=2 \
-          --image=registry.cn-huhehaote.aliyuncs.com/lumo/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-          --env=GIT_SYNC_BRANCH=cnn_tf_v1.9_compatible \
-          --sync-mode=git \
-          --sync-source=https://github.com/tensorflow/benchmarks.git \
-          --tensorboard \
-          --tensorboard-image=registry.cn-huhehaote.aliyuncs.com/lumo/tensorflow:1.12.0-devel \
-          --image-pull-secrets=lumo-secret  \
-          "mpirun python code/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64 --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-   ```
-2. Get the details of the job.
-   ```shell 
-   # arena get mpi-dist-with-secret
-   STATUS: RUNNING
-   NAMESPACE: default
-   PRIORITY: N/A
-   TRAINING DURATION: 9m
-    
-   NAME                  STATUS   TRAINER  AGE  INSTANCE                             NODE
-   mpi-dist-with-secret  RUNNING  MPIJOB   9m   mpi-dist-with-secret-launcher-v8sgt  172.16.0.201
-   mpi-dist-with-secret  RUNNING  MPIJOB   9m   mpi-dist-with-secret-worker-0        172.16.0.201
-   mpi-dist-with-secret  RUNNING  MPIJOB   9m   mpi-dist-with-secret-worker-1        172.16.0.202
-    
-   Your tensorboard will be available on:
-   http://172.16.0.198:30450
-   ```
-
-## <a name="pytorchjob">PyTorchJob With Secret</a>     
-Submit the job by using `--image-pull-secrets` to specify the imagePullSecrets.  
-1. Submit pytorch job.
-   ```shell
-   arena submit pytorch \
-       --name=pytorch-git-with-secret \
-       --gpus=1 \
-       --working-dir=/root \
-       --image=registry.cn-huhehaote.aliyuncs.com/lumo/pytorch-with-tensorboard-secret:1.5.1-cuda10.1-cudnn7-runtime \
-       --sync-mode=git \
-       --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
-       --data=training-data:/mnist_data \
-       --tensorboard \
-       --tensorboard-image=registry.cn-huhehaote.aliyuncs.com/lumo/tensorflow:1.12.0-devel \
-       --logdir=/mnist_data/pytorch_data/logs \
-       --image-pull-secrets=lumo-secret \
-       "python /root/code/mnist-pytorch/mnist.py --epochs 10 --backend nccl --dir /mnist_data/pytorch_data/logs --data /mnist_data/pytorch_data/"
-   ```
-2. Get the details of the job.
-   ```shell 
-   # arena get pytorch-git-with-secret
-   STATUS: RUNNING
-   NAMESPACE: default
-   PRIORITY: N/A
-   TRAINING DURATION: 2m
-    
-   NAME                     STATUS   TRAINER     AGE  INSTANCE                          NODE
-   pytorch-git-with-secret  RUNNING  PYTORCHJOB  2m   pytorch-git-with-secret-master-0  172.16.0.202
-    
-   Your tensorboard will be available on:
-   http://172.16.0.198:31155
-   ```
-## <a name="arenaConfig">Load imagePullSecrets from configuration of Arena</a>
-If you don't want to submit job by `--image-pull-secrets` every time. You can replace it with configuration of Arena.
-Open the file `~/.arena/config`, if not exist, create it. And fill in the following configurations.
-```shell
-imagePullSecrets=lumo-secret,king-secret
-```
-> Note:
-> `--image-pull-secrets` will overwrite `~/.arena/config`.
--- a/archived/docs/userguide/27-kfserving-custom.jpg
+++ b/archived/docs/userguide/27-kfserving-custom.jpg
--- a/archived/docs/userguide/27-kfserving-custom.md
+++ b/archived/docs/userguide/27-kfserving-custom.md
@ -1,62 +0,0 @@
-This guide walks through the steps to deploy and serve a custom model with kfserving
-
-1. Setup
-
-Follow the kFserving [guide](https://github.com/kubeflow/kfserving#install-kfserving) to install kFserving.For the prerequisites,you should ensure 8g memery and 4 core cpu avaliable in your environment.
-
-2. summit your serving job into kfserving
-```shell script
-arena serve kfserving --name=max-object-detector --port=5000 --image=codait/max-object-detector --model-type=custom 
-configmap/max-object-detector-202008221942-kfserving created
-configmap/max-object-detector-202008221942-kfserving labeled
-inferenceservice.serving.kubeflow.org/max-object-detector-202008221942 created
-```
-3. list the job you just serving
-```shell script
-arena serve list 
-NAME                 TYPE       VERSION       DESIRED  AVAILABLE  ENDPOINT_ADDRESS  PORTS
-max-object-detector  KFSERVING  202008221942  1        1          10.97.52.65       http:80
-```
-4. test the model service
-##### Determine the ingress IP and ports
-The first step is to [determine the ingress IP](https://github.com/kubeflow/kfserving/blob/master/README.md#determine-the-ingress-ip-and-ports) and ports and set INGRESS_HOST and INGRESS_PORT
-
-This example uses the [codait/max-object-detector](https://github.com/IBM/MAX-Object-Detector) image. The Max Object Detector api server expects a POST request to the /model/predict endpoint that includes an image multipart/form-data and an optional threshold query string.
-
-```shell script
-MODEL_NAME=max-object-detector-202008221942
-SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
-INGRESS_HOST=localhost
-INGRESS_PORT=80
-curl -v -F "image=@27-kfserving-custom.jpg" http://${INGRESS_HOST}:${INGRESS_PORT}/model/predict -H "Host: ${SERVICE_HOSTNAME}"
-*   Trying ::1...
-* TCP_NODELAY set
-* Connected to localhost (::1) port 80 (#0)
-> POST /model/predict HTTP/1.1
-> Host: max-object-detector-202008221942.default.example.com
-> User-Agent: curl/7.64.1
-> Accept: */*
-> Content-Length: 125769
-> Content-Type: multipart/form-data; boundary=------------------------56b67bc60fc7bdc7
-> Expect: 100-continue
->
-< HTTP/1.1 100 Continue
-* We are completely uploaded and fine
-< HTTP/1.1 200 OK
-< content-length: 380
-< content-type: application/json
-< date: Sun, 23 Aug 2020 03:27:14 GMT
-< server: istio-envoy
-< x-envoy-upstream-service-time: 3566
-<
-{"status": "ok", "predictions": [{"label_id": "1", "label": "person", "probability": 0.9440352320671082, "detection_box": [0.12420991063117981, 0.12507185339927673, 0.8423266410827637, 0.5974075794219971]}, {"label_id": "18", "label": "dog", "probability": 0.8645510673522949, "detection_box": [0.10447663068771362, 0.17799144983291626, 0.8422801494598389, 0.7320016026496887]}]}
-* Connection #0 to host localhost left intact
-* Closing connection 0
-```
-5. delete them
-```shell script
-arena serve delete max-object-detector --version=202008221942                                                                                                   2 err 
-inferenceservice.serving.kubeflow.org "max-object-detector-202008221942" deleted
-configmap "max-object-detector-202008221942-kfserving" deleted
-INFO[0001] The Serving job max-object-detector with version 202008221942 has been deleted successfully 
-```
--- a/archived/docs/userguide/28-elastictraining-tensorflow2-mnist.md
+++ b/archived/docs/userguide/28-elastictraining-tensorflow2-mnist.md
@ -1,175 +0,0 @@
-This guide walks through the steps to submit a elastic training job with horovod.
-
-1. Build image for training environment
-You can use the [registry.cn-hangzhou.aliyuncs.com/ai-samples/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.6.0.post0-py3.7-cuda10.1]() image directly.
-In addition, you can also build your own image with the help of this document [elastic-training-sample-image](https://code.aliyun.com/370272561/elastic-training-sample-image).
-
-2. Submit a elastic training job. Example code from [tensorflow2_mnist_elastic.py](https://github.com/horovod/horovod/blob/master/examples/elastic/tensorflow2_mnist_elastic.py)
-    ```shell script
-    arena submit etjob \
-        --name=elastic-training \
-        --gpus=1 \
-        --workers=3 \
-        --max-workers=9 \
-        --min-workers=1 \
-        --image=registry.cn-hangzhou.aliyuncs.com/ai-samples/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.6.0.post0-py3.7-cuda10.1 \
-        --working-dir=/examples \
-        "horovodrun
-        -np \$((\${workers}*\${gpus}))
-        --min-np \$((\${minWorkers}*\${gpus}))
-        --max-np \$((\${maxWorkers}*\${gpus}))
-        --host-discovery-script /usr/local/bin/discover_hosts.sh
-        python /examples/elastic/tensorflow2_mnist_elastic.py
-        "
-    ```
-    Output:
-    ```
-    configmap/elastic-training-etjob created
-    configmap/elastic-training-etjob labeled
-    trainingjob.kai.alibabacloud.com/elastic-training created
-    INFO[0000] The Job elastic-training has been submitted successfully
-    INFO[0000] You can run `arena get elastic-training --type etjob` to check the job status
-    ```
-
-3. List your job.
-    ```shell script
-    arena list
-    ```
-    Output:
-    ```
-    NAME              STATUS   TRAINER  AGE  NODE
-    elastic-training  RUNNING  ETJOB    52s  192.168.0.116
-    ```
-
-4. Get your job details.
-    ```shell script
-    arena get elastic-training
-    ```
-    Output:
-    ```
-    STATUS: RUNNING
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 1m
-
-    NAME              STATUS   TRAINER  AGE  INSTANCE                   NODE
-    elastic-training  RUNNING  ETJOB    1m   elastic-training-launcher  192.168.0.116
-    elastic-training  RUNNING  ETJOB    1m   elastic-training-worker-0  192.168.0.114
-    elastic-training  RUNNING  ETJOB    1m   elastic-training-worker-1  192.168.0.116
-    elastic-training  RUNNING  ETJOB    1m   elastic-training-worker-2  192.168.0.116
-    ```
-5. Check logs
-    ```shell script
-    arena logs elastic-training --tail 10
-    ```
-    Output:
-    ```
-    Tue Sep  8 08:32:50 2020[1]<stdout>:Step #2170	Loss: 0.021992
-    Tue Sep  8 08:32:50 2020[0]<stdout>:Step #2180	Loss: 0.000902
-    Tue Sep  8 08:32:50 2020[1]<stdout>:Step #2180	Loss: 0.023190
-    Tue Sep  8 08:32:50 2020[2]<stdout>:Step #2180	Loss: 0.013149
-    Tue Sep  8 08:32:51 2020[0]<stdout>:Step #2190	Loss: 0.029536
-    Tue Sep  8 08:32:51 2020[2]<stdout>:Step #2190	Loss: 0.017537
-    Tue Sep  8 08:32:51 2020[1]<stdout>:Step #2190	Loss: 0.018273
-    Tue Sep  8 08:32:51 2020[2]<stdout>:Step #2200	Loss: 0.038399
-    Tue Sep  8 08:32:51 2020[0]<stdout>:Step #2200	Loss: 0.007017
-    Tue Sep  8 08:32:51 2020[1]<stdout>:Step #2200	Loss: 0.017495
-    ```
-
-
-6. Scaleout your job. Will add one worker into jobs.
-    ```shell script
-    arena scaleout etjob --name="elastic-training" --count=1 --timeout=1m
-    ```
-    Output:
-    ```
-    configmap/elastic-training-1599548177-scaleout created
-    configmap/elastic-training-1599548177-scaleout labeled
-    scaleout.kai.alibabacloud.com/elastic-training-1599548177 created
-    INFO[0000] The scaleout job elastic-training-1599548177 has been submitted successfully
-    ```
-
-7. Get your job details. We can see new worker(elastic-training-worker-3) has been "RUNNING".
-    ```shell script
-    arena get elastic-training
-    ```
-    Output:
-    ```
-    STATUS: RUNNING
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 2m
-
-    NAME              STATUS   TRAINER  AGE  INSTANCE                   NODE
-    elastic-training  RUNNING  ETJOB    2m   elastic-training-launcher  192.168.0.116
-    elastic-training  RUNNING  ETJOB    2m   elastic-training-worker-0  192.168.0.114
-    elastic-training  RUNNING  ETJOB    2m   elastic-training-worker-1  192.168.0.116
-    elastic-training  RUNNING  ETJOB    2m   elastic-training-worker-2  192.168.0.116
-    elastic-training  RUNNING  ETJOB    2m   elastic-training-worker-3  192.168.0.117
-    ```
-
-8. Check logs.
-    ```shell script
-    arena logs elastic-training --tail 10
-    ```
-    Output:
-    ```
-    Tue Sep  8 08:33:33 2020[1]<stdout>:Step #3140	Loss: 0.014412
-    Tue Sep  8 08:33:33 2020[0]<stdout>:Step #3140	Loss: 0.004425
-    Tue Sep  8 08:33:33 2020[3]<stdout>:Step #3150	Loss: 0.000513
-    Tue Sep  8 08:33:33 2020[2]<stdout>:Step #3150	Loss: 0.062282
-    Tue Sep  8 08:33:33 2020[1]<stdout>:Step #3150	Loss: 0.020650
-    Tue Sep  8 08:33:33 2020[0]<stdout>:Step #3150	Loss: 0.008056
-    Tue Sep  8 08:33:34 2020[3]<stdout>:Step #3160	Loss: 0.002170
-    Tue Sep  8 08:33:34 2020[2]<stdout>:Step #3160	Loss: 0.009676
-    Tue Sep  8 08:33:34 2020[1]<stdout>:Step #3160	Loss: 0.051425
-    Tue Sep  8 08:33:34 2020[0]<stdout>:Step #3160	Loss: 0.023769
-    ```
-
-9. Scalein your job. Will remove one worker from current jobs.
-    ```shell script
-    arena scalein etjob --name="elastic-training" --count=1 --timeout=1m
-    ```
-    Output:
-    ```
-    configmap/elastic-training-1599554041-scalein created
-    configmap/elastic-training-1599554041-scalein labeled
-    scalein.kai.alibabacloud.com/elastic-training-1599554041 created
-    INFO[0000] The scalein job elastic-training-1599554041 has been submitted successfully
-    ```
-
-10. Get your job details. We can see that `elastic-training-worker-3` has been removed.
-    ```shell script
-    arena get elastic-training
-    ```
-    Output:
-    ```
-    STATUS: RUNNING
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 3m
-
-    NAME              STATUS   TRAINER  AGE  INSTANCE                   NODE
-    elastic-training  RUNNING  ETJOB    3m   elastic-training-launcher  192.168.0.116
-    elastic-training  RUNNING  ETJOB    3m   elastic-training-worker-0  192.168.0.114
-    elastic-training  RUNNING  ETJOB    3m   elastic-training-worker-1  192.168.0.116
-    elastic-training  RUNNING  ETJOB    3m   elastic-training-worker-2  192.168.0.116
-    ```
-
-11. Check logs.
-    ```shell script
-    arena logs elastic-training --tail 10
-    ```
-    Output:
-    ```
-    Tue Sep  8 08:34:43 2020[0]<stdout>:Step #5210	Loss: 0.005627
-    Tue Sep  8 08:34:43 2020[2]<stdout>:Step #5220	Loss: 0.002142
-    Tue Sep  8 08:34:43 2020[1]<stdout>:Step #5220	Loss: 0.002978
-    Tue Sep  8 08:34:43 2020[0]<stdout>:Step #5220	Loss: 0.011404
-    Tue Sep  8 08:34:44 2020[2]<stdout>:Step #5230	Loss: 0.000689
-    Tue Sep  8 08:34:44 2020[1]<stdout>:Step #5230	Loss: 0.024597
-    Tue Sep  8 08:34:44 2020[0]<stdout>:Step #5230	Loss: 0.040936
-    Tue Sep  8 08:34:44 2020[0]<stdout>:Step #5240	Loss: 0.000125
-    Tue Sep  8 08:34:44 2020[2]<stdout>:Step #5240	Loss: 0.026498
-    Tue Sep  8 08:34:44 2020[1]<stdout>:Step #5240	Loss: 0.000308
-    ```
--- a/archived/docs/userguide/29-elastictraining-pytorch-synthetic.md
+++ b/archived/docs/userguide/29-elastictraining-pytorch-synthetic.md
@ -1,182 +0,0 @@
-This guide walks through the steps to submit a elastic training job with horovod.
-
-1. Build image for training environment
-You can use the [registry.cn-hangzhou.aliyuncs.com/ai-samples/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.6.0.post0-py3.7-cuda10.1]() image directly.
-In addition, you can also build your own image with the help of this document [elastic-training-sample-image](https://code.aliyun.com/370272561/elastic-training-sample-image).
-
-2. Submit a elastic training job. Example code from [pytorch_synthetic_benchmark_elastic.py](https://github.com/horovod/horovod/blob/master/examples/elastic/pytorch_synthetic_benchmark_elastic.py)
-    ```shell script
-    arena submit etjob \
-            --name=elastic-training-synthetic \
-            --gpus=1 \
-            --workers=3 \
-            --max-workers=9 \
-            --min-workers=1 \
-            --image=registry.cn-hangzhou.aliyuncs.com/ai-samples/horovod:0.20.0-tf2.3.0-torch1.6.0-mxnet1.6.0.post0-py3.7-cuda10.1 \
-            --working-dir=/examples \
-            "horovodrun
-            --verbose
-            --log-level=DEBUG
-            -np \$((\${workers}*\${gpus}))
-            --min-np \$((\${minWorkers}*\${gpus}))
-            --max-np \$((\${maxWorkers}*\${gpus}))
-            --start-timeout 100
-            --elastic-timeout 1000
-            --host-discovery-script /usr/local/bin/discover_hosts.sh
-            python /examples/elastic/pytorch_synthetic_benchmark_elastic.py
-            --num-iters=10000
-            --num-warmup-batches=0"
-    ```
-    Output:
-    ```
-    configmap/elastic-training-synthetic-etjob created
-    configmap/elastic-training-synthetic-etjob labeled
-    trainingjob.kai.alibabacloud.com/elastic-training-synthetic created
-    INFO[0000] The Job elastic-training-synthetic has been submitted successfully
-    INFO[0000] You can run `arena get elastic-training-synthetic --type etjob` to check the job status
-    ```
-
-3. List your job.
-    ```shell script
-    arena list
-    ```
-    Output:
-    ```
-    NAME                        STATUS     TRAINER  AGE  NODE
-    elastic-training-synthetic  RUNNING    ETJOB    2m   192.168.0.112
-    ```
-
-4. Get your job details.
-    ```shell script
-    arena get elastic-training-synthetic
-    ```
-    Output:
-    ```
-    STATUS: RUNNING
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 3m
-
-    NAME                        STATUS   TRAINER  AGE  INSTANCE                             NODE
-    elastic-training-synthetic  RUNNING  ETJOB    3m   elastic-training-synthetic-launcher  192.168.0.112
-    elastic-training-synthetic  RUNNING  ETJOB    3m   elastic-training-synthetic-worker-0  192.168.0.116
-    elastic-training-synthetic  RUNNING  ETJOB    3m   elastic-training-synthetic-worker-1  192.168.0.117
-    elastic-training-synthetic  RUNNING  ETJOB    3m   elastic-training-synthetic-worker-2  192.168.0.116
-    ```
-
-5. Check logs
-    ```shell script
-    arena logs elastic-training-synthetic --tail 10
-    ```
-    Output:
-    ```
-    Tue Sep  8 09:24:20 2020[0]<stdout>:Iter #54: 95.3 img/sec per GPU
-    Tue Sep  8 09:24:23 2020[0]<stdout>:Iter #55: 95.3 img/sec per GPU
-    Tue Sep  8 09:24:27 2020[0]<stdout>:Iter #56: 94.6 img/sec per GPU
-    Tue Sep  8 09:24:30 2020[0]<stdout>:Iter #57: 97.1 img/sec per GPU
-    Tue Sep  8 09:24:33 2020[0]<stdout>:Iter #58: 99.7 img/sec per GPU
-    Tue Sep  8 09:24:36 2020[0]<stdout>:Iter #59: 99.8 img/sec per GPU
-    Tue Sep  8 09:24:40 2020[0]<stdout>:Iter #60: 98.0 img/sec per GPU
-    Tue Sep  8 09:24:43 2020[0]<stdout>:Iter #61: 97.1 img/sec per GPU
-    Tue Sep  8 09:24:46 2020[0]<stdout>:Iter #62: 96.1 img/sec per GPU
-    Tue Sep  8 09:24:50 2020[0]<stdout>:Iter #63: 100.4 img/sec per GPU
-    ```
-
-
-6. Scaleout your job. Will add one worker into jobs.
-    ```shell script
-    arena scaleout etjob --name="elastic-training-synthetic" --count=1 --timeout=1m
-    ```
-    Output:
-    ```
-    configmap/elastic-training-synthetic-1599557124-scaleout created
-    configmap/elastic-training-synthetic-1599557124-scaleout labeled
-    scaleout.kai.alibabacloud.com/elastic-training-synthetic-1599557124 created
-    INFO[0000] The scaleout job elastic-training-synthetic-1599557124 has been submitted successfully
-    ```
-
-7. Get your job details. We can see new worker(elastic-training-synthetic-worker-3) has been "RUNNING".
-    ```shell script
-    arena get elastic-training-synthetic
-    ```
-    Output:
-    ```
-    STATUS: RUNNING
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 5m
-
-    NAME                        STATUS   TRAINER  AGE  INSTANCE                             NODE
-    elastic-training-synthetic  RUNNING  ETJOB    5m   elastic-training-synthetic-launcher  192.168.0.112
-    elastic-training-synthetic  RUNNING  ETJOB    5m   elastic-training-synthetic-worker-0  192.168.0.116
-    elastic-training-synthetic  RUNNING  ETJOB    5m   elastic-training-synthetic-worker-1  192.168.0.117
-    elastic-training-synthetic  RUNNING  ETJOB    5m   elastic-training-synthetic-worker-2  192.168.0.116
-    elastic-training-synthetic  RUNNING  ETJOB    5m   elastic-training-synthetic-worker-3  192.168.0.112
-    ```
-
-8. Check logs.
-    ```shell script
-    arena logs elastic-training-synthetic --tail 10
-    ```
-    Output:
-    ```
-    Tue Sep  8 09:26:03 2020[0]<stdout>:Iter #76: 65.0 img/sec per GPU
-    Tue Sep  8 09:26:08 2020[0]<stdout>:Iter #77: 64.0 img/sec per GPU
-    Tue Sep  8 09:26:13 2020[0]<stdout>:Iter #78: 65.4 img/sec per GPU
-    Tue Sep  8 09:26:18 2020[0]<stdout>:Iter #79: 64.4 img/sec per GPU
-    Tue Sep  8 09:26:23 2020[0]<stdout>:Iter #80: 62.9 img/sec per GPU
-    Tue Sep  8 09:26:28 2020[0]<stdout>:Iter #81: 64.0 img/sec per GPU
-    Tue Sep  8 09:26:33 2020[0]<stdout>:Iter #82: 64.4 img/sec per GPU
-    Tue Sep  8 09:26:38 2020[0]<stdout>:Iter #83: 64.9 img/sec per GPU
-    Tue Sep  8 09:26:43 2020[0]<stdout>:Iter #84: 62.7 img/sec per GPU
-    Tue Sep  8 09:26:48 2020[0]<stdout>:Iter #85: 64.2 img/sec per GPU
-    ```
-
-9. Scalein your job. Will remove one worker from current jobs.
-    ```shell script
-    arena scalein etjob --name="elastic-training-synthetic" --count=1 --timeout=1m
-    ```
-    Output:
-    ```
-    configmap/elastic-training-synthetic-1599557271-scalein created
-    configmap/elastic-training-synthetic-1599557271-scalein labeled
-    scalein.kai.alibabacloud.com/elastic-training-synthetic-1599557271 created
-    INFO[0000] The scalein job elastic-training-synthetic-1599557271 has been submitted successfully
-    ```
-
-10. Get your job details. We can see that `elastic-training-synthetic-worker-3` has been removed.
-    ```shell script
-    arena get elastic-training-synthetic
-    ```
-    Output:
-    ```
-    STATUS: RUNNING
-    NAMESPACE: default
-    PRIORITY: N/A
-    TRAINING DURATION: 7m
-
-    NAME                        STATUS   TRAINER  AGE  INSTANCE                             NODE
-    elastic-training-synthetic  RUNNING  ETJOB    7m   elastic-training-synthetic-launcher  192.168.0.112
-    elastic-training-synthetic  RUNNING  ETJOB    7m   elastic-training-synthetic-worker-0  192.168.0.116
-    elastic-training-synthetic  RUNNING  ETJOB    7m   elastic-training-synthetic-worker-1  192.168.0.117
-    elastic-training-synthetic  RUNNING  ETJOB    7m   elastic-training-synthetic-worker-2  192.168.0.116
-    ```
-
-11. Check logs.
-    ```shell script
-    arena logs elastic-training-synthetic --tail 10
-    ```
-    Output:
-    ```
-    DEBUG:root:host elastic-training-synthetic-worker-3 has been blacklisted, ignoring exit from local_rank=0
-    Process 3 exit with status code 134.
-    Tue Sep  8 09:27:56 2020[0]<stdout>:Iter #97: 96.0 img/sec per GPU
-    Tue Sep  8 09:28:00 2020[0]<stdout>:Iter #98: 95.4 img/sec per GPU
-    Tue Sep  8 09:28:03 2020[0]<stdout>:Iter #99: 96.9 img/sec per GPU
-    Tue Sep  8 09:28:06 2020[0]<stdout>:Iter #100: 97.2 img/sec per GPU
-    Tue Sep  8 09:28:10 2020[0]<stdout>:Iter #101: 98.5 img/sec per GPU
-    Tue Sep  8 09:28:13 2020[0]<stdout>:Iter #102: 95.8 img/sec per GPU
-    Tue Sep  8 09:28:16 2020[0]<stdout>:Iter #103: 97.3 img/sec per GPU
-    Tue Sep  8 09:28:20 2020[0]<stdout>:Iter #104: 97.3 img/sec per GPU
-    Tue Sep  8 09:28:23 2020[0]<stdout>:Iter #105: 98.9 img/sec per GPU
-    ```
--- a/archived/docs/userguide/3-tensorboard.jpg
+++ b/archived/docs/userguide/3-tensorboard.jpg
--- a/archived/docs/userguide/3-tfjob-distributed.md
+++ b/archived/docs/userguide/3-tfjob-distributed.md
@ -1,72 +0,0 @@
-
-
-Arena supports and simplifies distributed TensorFlow Training (PS/worker mode). 
-
-
-1. To run a distributed Tensorflow Training, you need to specify:
-
- - GPUs of each worker (only for GPU workload)
- - The number of workers (required)
- - The number of PS (required)
- - The docker image of worker (required)
- - The docker image of PS (required)
- - The Port of Worker (default is 22222)
- - The Port of PS (default is 22223)
-
-The following command is an example. In this example, it defines 2 workers and 1 PS, and each worker has 1 GPU. The source code of worker and PS are located in git, and the tensorboard are enabled.
-
-```
-# arena submit tf \
-    --name=tf-dist-git \
-    --gpus=1 \
-    --workers=2 \
-    --worker-image=tensorflow/tensorflow:1.5.0-devel-gpu \
-    --sync-mode=git \
-    --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-    --ps=1 \
-    --ps-image=tensorflow/tensorflow:1.5.0-devel \
-    --tensorboard \
-    "python code/tensorflow-sample-code/tfjob/docker/v1alpha2/distributed-mnist/main.py --log_dir=/training_logs --data_dir=code/tensorflow-sample-code/data"
-
-configmap/tf-dist-git-tfjob created
-configmap/tf-dist-git-tfjob labeled
-service/tf-dist-git-tensorboard created
-deployment.extensions/tf-dist-git-tensorboard created
-tfjob.kubeflow.org/tf-dist-git created
-INFO[0001] The Job tf-dist-git has been submitted successfully
-INFO[0001] You can run `arena get tf-dist-git --type tfjob` to check the job status
-```
-
-**Note**: If you saw the job or pod is failed, and then look at the logs, you may find out it is due to the reason that git code is not be able to cloned, especially if you are runing container insider some countries like China. This is not caused by arena, but cross-border network connectivity. 
-
-2\. Get the details of the specific job
-
-```
-# arena get tf-dist-git
-NAME         STATUS   TRAINER  AGE  INSTANCE                            NODE                   
-tf-dist-git  RUNNING  tfjob    55s  tf-dist-git-tfjob-594d59789c-lrfsk  192.168.1.119
-tf-dist-git  RUNNING  tfjob    55s  tf-dist-git-tfjob-ps-0              192.168.1.118
-tf-dist-git  RUNNING  tfjob    55s  tf-dist-git-tfjob-worker-0          192.168.1.119
-tf-dist-git  RUNNING  tfjob    55s  tf-dist-git-tfjob-worker-1          192.168.1.120
-
-Your tensorboard will be available on:
-192.168.1.117:32298
-```
-
-3\. Check the tensorboard
-
-![](3-tensorboard.jpg)
-
-
-4\. Get the TFJob dashboard
-
-```
-# arena logviewer tf-dist-git
-Your LogViewer will be available on:
-192.168.1.120:8080/tfjobs/ui/#/default/tf-dist-git-tfjob
-```
-
-
-![](4-tfjob-logviewer-distributed.jpg)
-
-Congratulations! You've run the distributed training job with `arena` successfully. 
--- a/archived/docs/userguide/30-tfjob-with-role-sequence.md
+++ b/archived/docs/userguide/30-tfjob-with-role-sequence.md
@ -1,78 +0,0 @@
-The Distributed Tensorflow job has some roles, includes: Worker,PS,Chief,Evaluator. Sometimes, you may need to decide the sequence when creating them, for example, you may need to create "Worker" role first and then create "PS" role second, This guide will help you.
-
-1. Now, assume that you want to submit a Distributed Tensorflow job，the tensorflow job has four roles: Worker,PS,Chief,Evaluator and you need the role starting sequence is "Worker,Chief,PS,Evaluator", it is simple for you only add option "--role-sequence" when submitting the job,the following command is an example:
-
-```
-$ arena submit tfjob \
--name=tf-distributed-test \
--role-sequence "Worker,Chief,PS,Evaluator" \
--chief \
--evaluator \
--gpus=1 \
--workers=1 \
--worker-image=cheyang/tf-mnist-distributed:gpu \
--ps-image=cheyang/tf-mnist-distributed:cpu \
--ps=1 \
--tensorboard \
--tensorboard-image="registry.cn-hongkong.aliyuncs.com/ai-samples/tensorflow:1.12.0-devel" \
-"python /app/main.py"
-```
-
-the "--role-sequence Worker,Chief,PS,Evaluator" is the same as "--role-sequence w,c,p,e" and "w" represents "Worker", "c" represents "Chief", "p" represents "PS" and "e" represents "Evaluator". 
-
-2. Make sure at least one pod belonging to the tfjob "tf-distributed-test" has annotation "job-role-sequence=Worker,Chief,PS,Evaluator":
-
-```
-$ kubectl get po -l tf-job-name=tf-distributed-test
-NAME                              READY   STATUS              RESTARTS   AGE
-tf-distributed-test-chief-0       0/1     ContainerCreating   0          5m47s
-tf-distributed-test-evaluator-0   0/1     ContainerCreating   0          5m47s
-tf-distributed-test-ps-0          1/1     Running             0          5m47s
-tf-distributed-test-worker-0      0/1     ContainerCreating   0          5m47s
-
-$ kubectl get po tf-distributed-test-worker-0 -o yaml
-apiVersion: v1
-kind: Pod
-metadata:
-  annotations:
-    job-role-sequence: Worker,Chief,PS,Evaluator
-    kubernetes.io/psp: ack.privileged
-    requestGPUsOfJobOwner: "3"
-  creationTimestamp: 2021-02-22T03:07:49Z
-....
-
-```
-
-3. You can validate it by querying the tf-operator logs.
-
-```
-$ kubectl get po -n arena-system
-NAME                                READY   STATUS    RESTARTS   AGE
-et-operator-576887864c-lvmrs        1/1     Running   1          19d
-mpi-operator-66b4cf9b76-kl2fm       1/1     Running   0          26d
-pytorch-operator-8545c46f98-cffgw   1/1     Running   4          26d
-tf-job-dashboard-78478bfc45-msbzn   1/1     Running   0          19d
-tf-job-operator-554d594cff-5vxfg    1/1     Running   0          101m
-```
-
-Query the logs of tf-job-operator-554d594cff-5vxfg.
-
-```
-$  kubectl logs tf-job-operator-554d594cff-5vxfg -n arena-system  | grep "the Role Sequence" | tail -n 1
-{"filename":"tensorflow/controller.go:453","job":"default.tf-distributed-test","level":"info","msg":"the Role Sequence of job tf-distributed-test is: [Worker Chief PS Evaluator]","time":"2021-02-01T13:22:23Z","uid":"7db02629-4591-4e0c-a938-c6e4a1cfc074"}
-```
-
-
-As you see the sequence of tf-operator handles the tfjob roles is match the sequence you specified.
-
-If you don't want to specify the role sequence every time when submitting the tfjob, you can save the role sequence to the arena configuration file "~/.arena/config", like: 
-
-```
-tfjob_role_sequence = Worker,PS,Chief,Evaluator
-```
-
-or 
-
-```
-tfjob_role_sequence = w,p,c,e
-```
--- a/archived/docs/userguide/31-support-multiple-users.md
+++ b/archived/docs/userguide/31-support-multiple-users.md
@ -1,128 +0,0 @@
-
-## Support Multiple Users
-
-In some usage scenarios, you may want multiple users to use arena and these users have different permissions to operate the kubernetes cluster. This guide will tell you how to implement the goal. 
-
-Now, assume that there is 3 users to use arena and their privileges are described as follow table:
-
-
-| User Name | User Namespace | Quota | Additional Privileges |
-| --------- | -------------- | ----- |---------------------- |
-| alex      | workplace1     | -    |-|
-| bob       | workplace2     |limits.cpu: "10",limits.memory: "20Gi",requests.cpu: "5",requests.memory: "10Gi" |list the jobs in the cluster scope|
-| tom       | workplace3     |requests.nvidia.com/gpu: 20|list the jobs in the namespace scope|
-
-the following steps describe how to generate the kubeconfig files of the users.
-
-1.Prepare the user configuration file, you can refer the ~/charts/user/values.yaml or /charts/user/values.yaml to write your own user configuration file.
-
-The user alex doesn't need to prepare a user configuration file,because it use the default configuration. 
-
-The user bob's user configuration file is defined as: 
-
-```
-quota:
-  limits.cpu: "10"
-  requests.cpu: "5"
-  requests.memory: "10Gi"
-  limits.memory: "20Gi"
-
-clusterRoles:
-  - apiGroups:
-    - batch
-    resources:
-    - jobs
-    verbs:
-    - list
-```
-
-and store it to /tmp/bob-config.yaml
-
-The user tom's user configuration file is defined as: 
-
-```
-quota:
-  requests.nvidia.com/gpu: 5
-
-roles:
-  - apiGroups:
-    - batch
-    resources:
-    - jobs
-    verbs:
-    - list
-```
-and store it to /tmp/tom-config.yaml
-
-
-2.Generate user kubeconfig, the script 'arena-gen-kubeconfig.sh' can help you:
-
-```
-$ arena-gen-kubeconfig.sh -h
-
-Usage:
-
-    arena-gen-kubeconfig.sh [OPTION1] [OPTION2] ...
-
-Options:
-    --user-name <USER_NAME>                    Specify the user name
-    --user-namespace <USER_NAMESPACE>          Specify the user namespace
-    --user-config <USER_CONFIG>                Specify the user config,refer the ~/charts/user/values.yaml or /charts/user/values.yaml
-    --force                                    If the user has been existed,force to update the user
-    --delete                                   Delete the user
-    --output <KUBECONFIG|USER_MANIFEST_YAML>   Specify the output kubeconfig file or the user manifest yaml
-    --admin-kubeconfig <ADMIN_KUBECONFIG>      Specify the Admin kubeconfig file
-    --cluster-url <CLUSTER_URL>                Specify the Cluster URL,if not specified,the script will detect the cluster url
-    --create-user-yaml                         Only generate the user manifest yaml,don't apply it and create kubeconfig file
-```
-
-Firstly, create the kubeconfig file of alex: 
-
-```
-$  arena-gen-kubeconfig.sh --user-name alex --user-namespace workplace1 --output /tmp/alex.kubeconfig --force
-
-2021-02-08/11:38:44  DEBUG  found arena charts in /Users/yangjunfeng/charts
-2021-02-08/11:38:44  DEBUG  the user configuration not set,use the default configuration file
-resourcequota/arena-quota-alex created
-serviceaccount/alex created
-clusterrole.rbac.authorization.k8s.io/arena:workplace1:alex configured
-clusterrolebinding.rbac.authorization.k8s.io/arena:workplace1:alex configured
-role.rbac.authorization.k8s.io/arena:alex created
-rolebinding.rbac.authorization.k8s.io/arena:alex created
-configmap/arena-user-alex created
-Cluster "https://192.168.1.42:6443" set.
-User "alex" set.
-Context "registry" created.
-Switched to context "registry".
-2021-02-08/11:38:48  DEBUG  kubeconfig written to file /tmp/alex.kubeconfig
-```
-As you see the kubeconfig file has been created(/tmp/alex.kubeconfig).
-
-Secondly, create the kubeconfig file of user bob:
-
-```
-$ arena-gen-kubeconfig.sh --user-name bob --user-namespace workplace2 --user-config /tmp/bob.yaml --output /tmp/bob.kubeconfig --force
-```
-the kubeconfig file will store at /tmp/bob.kubeconfig 
-
-Thirdly, create the kubeconfig file of user tom:
-
-```
-$ arena-gen-kubeconfig.sh --user-name tom --user-namespace workplace3 --user-config /tmp/tom.yaml --output /tmp/tom.kubeconfig --force
-```
-the kubeconfig file will store at /tmp/tom.kubeconfig 
-
-3.Make the kubeconfig file is valid, you can set the env KUBECONFIG like:
-
-```
-$ export KUBECONFIG=/tmp/alex.kubeconfig
- 
-```
-
-4.Now you can use arena to submit your training jobs.
-
-5.If you want to delete the user,execute the command like:
-
-```
-$ arena-gen-kubeconfig.sh --user-name tom --user-namespace workplace3 --delete
-```
--- a/archived/docs/userguide/4-tfjob-distributed-data.md
+++ b/archived/docs/userguide/4-tfjob-distributed-data.md
@ -1,110 +0,0 @@
-
-`arena` allows to mount multiple data volumes into the training jobs. There is an example that mounts `data volume` into the training job.
-
-
-1. You need to create `/data` in the NFS Server, and prepare `mnist data`
-
-```
-# mkdir -p /nfs
-# mount -t nfs -o vers=4.0 NFS_SERVER_IP:/ /nfs
-# mkdir -p /data
-# cd /data
-# wget https://raw.githubusercontent.com/cheyang/tensorflow-sample-code/master/data/t10k-images-idx3-ubyte.gz
-# wget https://raw.githubusercontent.com/cheyang/tensorflow-sample-code/master/data/t10k-labels-idx1-ubyte.gz
-# wget https://raw.githubusercontent.com/cheyang/tensorflow-sample-code/master/data/train-images-idx3-ubyte.gz
-# wget https://raw.githubusercontent.com/cheyang/tensorflow-sample-code/master/data/train-labels-idx1-ubyte.gz
-# cd /
-# umount /nfs
-```
-
-2\. Create Persistent Volume. Moidfy `NFS_SERVER_IP` to yours.
-
-```
-# cat nfs-pv.yaml
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: tfdata
-  labels:
-    tfdata: nas-mnist
-spec:
-  persistentVolumeReclaimPolicy: Retain
-  capacity:
-    storage: 10Gi
-  accessModes:
-  - ReadWriteMany
-  nfs:
-    server: NFS_SERVER_IP
-    path: "/data"
-    
- # kubectl create -f nfs-pv.yaml
-```
-
-3\. Create Persistent Volume Claim. 
-
-```
-# cat nfs-pvc.yaml
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: tfdata
-  annotations:
-    description: "this is the mnist demo"
-    owner: Tom
-spec:
-  accessModes:
-    - ReadWriteMany
-  resources:
-    requests:
-       storage: 5Gi
-  selector:
-    matchLabels:
-      tfdata: nas-mnist
-# kubectl create -f nfs-pvc.yaml
-```
-
-> Notice: suggest to add `description` and `owner`
-
-4\. Check the data volume
-
-```
-# arena data list 
-NAME    ACCESSMODE     DESCRIPTION             OWNER   AGE
-tfdata  ReadWriteMany  this is for mnist demo  myteam  43d
-```
-
-5\. Now we can submit a distributed training job with `arena`, it will download the source code from github and mount data volume `tfdata` to `/mnist_data`.
-
-```
-# arena submit tf --name=tf-dist-data         \
-              --gpus=1              \
-              --workers=2              \
-              --workerImage=tensorflow/tensorflow:1.5.0-devel-gpu  \
-              --syncMode=git \
-              --syncSource=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-              --ps=1              \
-              --psImage=tensorflow/tensorflow:1.5.0-devel   \
-              --tensorboard \
-              --data=tfdata:/mnist_data \
-              "python code/tensorflow-sample-code/tfjob/docker/v1alpha2/distributed-mnist/main.py --log_dir /training_logs --data_dir /mnist_data"
-```
-
-> `--data` specifies the data volume to mount to all the tasks of the job, like <name_of_datasource>:<mount_point_on_job>. In this example, the data volume is `tfdata`, and the target directory is `/mnist_data`.
-
-
-6\. From the logs, we find that the training data is extracted from `/mnist_data` instead of downloading from internet directly.
-
-```
-# arena logs tf-dist-data
-...
-Extracting /mnist_data/train-images-idx3-ubyte.gz
-Extracting /mnist_data/train-labels-idx1-ubyte.gz
-Extracting /mnist_data/t10k-images-idx3-ubyte.gz
-Extracting /mnist_data/t10k-labels-idx1-ubyte.gz
-...
-Accuracy at step 960: 0.9753
-Accuracy at step 970: 0.9739
-Accuracy at step 980: 0.9756
-Accuracy at step 990: 0.9777
-Adding run metadata for 999
-```
--- a/archived/docs/userguide/4-tfjob-logviewer-distributed.jpg
+++ b/archived/docs/userguide/4-tfjob-logviewer-distributed.jpg
--- a/archived/docs/userguide/5-mpi-logviewer.jpg
+++ b/archived/docs/userguide/5-mpi-logviewer.jpg
--- a/archived/docs/userguide/5-mpi-tensorboard.jpg
+++ b/archived/docs/userguide/5-mpi-tensorboard.jpg
--- a/archived/docs/userguide/5-mpijob-distributed.md
+++ b/archived/docs/userguide/5-mpijob-distributed.md
@ -1,56 +0,0 @@
-
-Arena supports and simplifies distributed TensorFlow Training (MPI mode). 
-
-
-1. To run a distributed Training with MPI support, you need to specify:
-
- - GPUs of each worker (only for GPU workload)
- - The number of workers (required)
- - The docker image of MPI worker (required)
- 
-
-The following command is an example. In this example, it defines 2 workers, and each worker has 1 GPU. The tensorboard are enabled.
-
-```
-# arena submit mpi 
-    --name=mpi-dist \
-    --gpus=1 \
-    --workers=2 \
-    --image=uber/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-    --env=GIT_SYNC_BRANCH=cnn_tf_v1.9_compatible \
-    --sync-mode=git \
-    --sync-source=https://github.com/tensorflow/benchmarks.git \
-    --tensorboard \
-    "mpirun python code/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64 --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-```
-
-2\. Get the details of the specific job
-
-```
-# arena get mpi-dist
-NAME      STATUS   TRAINER  AGE  INSTANCE                        NODE
-mpi-dist  RUNNING  MPIJOB   1d   mpi-dist-mpijob-launcher-ndnw8  192.168.1.120
-mpi-dist  RUNNING  MPIJOB   1d   mpi-dist-mpijob-worker-0        192.168.1.119
-mpi-dist  RUNNING  MPIJOB   1d   mpi-dist-mpijob-worker-1        192.168.1.120
-
-Your tensorboard will be available on:
-192.168.1.117:32559
-```
-
-3\. Check the tensorboard
-
-![](5-mpi-tensorboard.jpg)
-
-
-4\. Get the MPI dashboard
-
-```
-# arena logviewer mpi-dist
-Your LogViewer will be available on:
-192.168.1.119:9090/#!/log/default/mpi-dist-mpijob-launcher-ndnw8/mpi?namespace=default
-```
-
-
-![](5-mpijob-logviewer.jpg)
-
-Congratulations! You've run the distributed MPI training job with `arena` successfully. 
--- a/archived/docs/userguide/6-tfjob-gangschd.md
+++ b/archived/docs/userguide/6-tfjob-gangschd.md
@ -1,67 +0,0 @@
-
-Arena supports distributed TensorFlow Training with gang scheduling by using [kube-arbitrator](https://github.com/kubernetes-incubator/kube-arbitrator). 
-
-When running distributed TensorFlow, we'd better to make sure `all` or `nothing`. Gang scheduling can help such case. 
-
-
-> Notice: the current [kubernetes gang scheduler](https://github.com/kubernetes-incubator/kube-arbitrator/tree/release-0.1) is not production ready. For example, it doesn't support Pod Affinity and PodFitsHostPorts for sheduling. 
-
-> Limitation: when using gang scheduler, the tensorboard feature doesn't work well.
-
-1. To enable gang scheduler, edit `/charts/tfjob/values.yaml`
-
-Change `enableGangScheduler: false` to `enableGangScheduler: true`
-
-2. To run a distributed Tensorflow Training, you need to specify:
-
- - GPUs of each worker (only for GPU workload)
- - The number of workers (required)
- - The number of PS (required)
- - The docker image of worker (required)
- - The docker image of PS (required)
- - The Port of Worker (default is 22222)
- - The Port of PS (default is 22223)
-
-The following command is an example. In this example, it defines 2 workers and 1 PS, and each worker has 1 GPU. The source code of worker and PS are located in git, and the tensorboard are enabled.
-
-```
-# arena submit tf --name=tf-dist-git              \
-              --gpus=1              \
-              --workers=2              \
-              --workerImage=tensorflow/tensorflow:1.5.0-devel-gpu  \
-              --syncMode=git \
-              --syncSource=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-              --ps=1              \
-              --psImage=tensorflow/tensorflow:1.5.0-devel   \
-              "python code/tensorflow-sample-code/tfjob/docker/v1alpha2/distributed-mnist/main.py --log_dir /training_logs"
-configmap/tf-dist-git-tfjob created
-configmap/tf-dist-git-tfjob labeled
-service/tf-dist-git-tensorboard created
-deployment.extensions/tf-dist-git-tensorboard created
-tfjob.kubeflow.org/tf-dist-git created
-INFO[0001] The Job tf-dist-git has been submitted successfully
-INFO[0001] You can run `arena get tf-dist-git --type tfjob` to check the job status
-
-```
-
-If there are no enough resources, all the instances of the job are `PENDING`. If it's not gang scheduler, you can see some of the pods are `RUNNING` and others are `PENDING`.
-
-```
-# arena get tf-dist-data
-NAME          STATUS   TRAINER  AGE  INSTANCE                     NODE
-tf-dist-data  PENDING  TFJOB    0s   tf-dist-data-tfjob-ps-0      N/A
-tf-dist-data  PENDING  TFJOB    0s   tf-dist-data-tfjob-worker-0  N/A
-tf-dist-data  PENDING  TFJOB    0s   tf-dist-data-tfjob-worker-1  N/A
-tf-dist-data  PENDING  TFJOB    0s   tf-dist-data-tfjob-worker-2  N/A
-tf-dist-data  PENDING  TFJOB    0s   tf-dist-data-tfjob-worker-3  N/A
-```
-
-When there are enough resources, the the instances become `RUNNING`
-
-```
-NAME          STATUS   TRAINER  AGE  INSTANCE                     NODE
-tf-dist-data  RUNNING  TFJOB    4s   tf-dist-data-tfjob-ps-0      192.168.1.115
-tf-dist-data  RUNNING  TFJOB    4s   tf-dist-data-tfjob-worker-0  192.168.1.119
-tf-dist-data  RUNNING  TFJOB    4s   tf-dist-data-tfjob-worker-1  192.168.1.118
-tf-dist-data  RUNNING  TFJOB    4s   tf-dist-data-tfjob-worker-2  192.168.1.120
-```
--- a/archived/docs/userguide/7-tf-serving-gpu.md
+++ b/archived/docs/userguide/7-tf-serving-gpu.md
--- a/archived/docs/userguide/7-tf-serving-gpushare.md
+++ b/archived/docs/userguide/7-tf-serving-gpushare.md
--- a/archived/docs/userguide/7-tf-serving.md
+++ b/archived/docs/userguide/7-tf-serving.md
--- a/archived/docs/userguide/8-tfjob-estimator-tensorboard.jpg
+++ b/archived/docs/userguide/8-tfjob-estimator-tensorboard.jpg
--- a/archived/docs/userguide/8-tfjob-estimator.md
+++ b/archived/docs/userguide/8-tfjob-estimator.md
@ -1,140 +0,0 @@
-
-You can also use high-level TensorFlow API – tf.estimator.Estimator class – for running Distributed TensorFlow with good modularity by using `Arena`.
-
-1. Create Persistent Volume. Moidfy `NFS_SERVER_IP` to yours.
-
-```
-# cat nfs-pv.yaml
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: tfdata
-  labels:
-    tfdata: nas-mnist
-spec:
-  persistentVolumeReclaimPolicy: Retain
-  capacity:
-    storage: 10Gi
-  accessModes:
-  - ReadWriteMany
-  nfs:
-    server: NFS_SERVER_IP
-    path: "/data"
-    
- # kubectl create -f nfs-pv.yaml
-```
-
-2\. Create Persistent Volume Claim. 
-
-```
-# cat nfs-pvc.yaml
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: tfdata
-  annotations:
-    description: "this is the mnist demo"
-    owner: Tom
-spec:
-  accessModes:
-    - ReadWriteMany
-  resources:
-    requests:
-       storage: 5Gi
-  selector:
-    matchLabels:
-      tfdata: nas-mnist
-# kubectl create -f nfs-pvc.yaml
-```
-
-> Notice: suggest to add `description` and `owner`
-
-3\. Check the data volume
-
-```
-# arena data list 
-NAME    ACCESSMODE     DESCRIPTION             OWNER   AGE
-tfdata  ReadWriteMany  this is for mnist demo  myteam  43d
-```
-
-4\. To run a distributed Tensorflow Training, you need to specify:
-
- - GPUs of each worker (Include chief and evaluator)
- - Enable chief (required)
- - Enable Evaluator (optional)
- - The number of workers (required)
- - The number of PS (required)
- - The docker image of worker and master (required)
- - The docker image of PS (required)
- - The Port of Chief (default is 22221)
- - The Port of Worker (default is 22222)
- - The Port of PS (default is 22223)
-
-The following command is an example. In this example, it defines 1 chief worker, 1 workers, 1 PS and 1 evaluator, and each worker has 1 GPU. The source code of worker and PS are located in git, and the tensorboard are enabled.
-
-```
-# arena submit tf --name=tf-estimator              \
-              --gpus=1              \
-              --workers=1             \
-              --chief                  \
-              --evaluator              \
-              --data=tfdata:/data/mnist     \
-              --logdir=/data/mnist/models \
-              --workerImage=tensorflow/tensorflow:1.9.0-devel-gpu  \
-              --syncMode=git \
-              --syncSource=https://github.com/cheyang/models.git \
-              --ps=1              \
-              --psImage=tensorflow/tensorflow:1.9.0-devel   \
-              --tensorboard \
-              "bash code/models/dist_mnist_estimator.sh --data_dir=/data/mnist/MNIST_data  --model_dir=/data/mnist/models"
-configmap/tf-estimator-tfjob created
-configmap/tf-estimator-tfjob labeled
-service/tf-estimator-tensorboard created
-deployment.extensions/tf-estimator-tensorboard created
-tfjob.kubeflow.org/tf-estimator created
-INFO[0001] The Job tf-estimator has been submitted successfully
-INFO[0001] You can run `arena get tf-estimator --type tfjob` to check the job status
-
-``` 
-
-> `--data` specifies the data volume to mount to all the tasks of the job, like <name_of_datasource>:<mount_point_on_job>. In this example, the data volume is `tfdata`, and the target directory is `/data/mnist`.
-
-
-5\. From the logs, we have found the training is started
-
-```
-# arena logs tf-estimator
-2018-09-27T00:37:01.576672145Z 2018-09-27 00:37:01.576562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:chief/replica:0/task:0/device:GPU:0 with 15123 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:08.0, compute capability: 6.0)
-2018-09-27T00:37:01.578669608Z 2018-09-27 00:37:01.578523: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job chief -> {0 -> localhost:22222}
-2018-09-27T00:37:01.578685739Z 2018-09-27 00:37:01.578550: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> tf-estimator-tfjob-ps-0:22223}
-2018-09-27T00:37:01.578705274Z 2018-09-27 00:37:01.578562: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> tf-estimator-tfjob-worker-0:22222}
-2018-09-27T00:37:01.579637826Z 2018-09-27 00:37:01.579454: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:334] Started server with target: grpc://localhost:22222
-2018-09-27T00:37:01.701520696Z I0927 00:37:01.701258 140281586534144 tf_logging.py:115] Calling model_fn.
-2018-09-27T00:37:02.172552485Z I0927 00:37:02.172167 140281586534144 tf_logging.py:115] Done calling model_fn.
-2018-09-27T00:37:02.173930978Z I0927 00:37:02.173732 140281586534144 tf_logging.py:115] Create CheckpointSaverHook.
-2018-09-27T00:37:02.431259294Z I0927 00:37:02.430984 140281586534144 tf_logging.py:115] Graph was finalized.
-2018-09-27T00:37:02.4472109Z 2018-09-27 00:37:02.447018: I tensorflow/core/distributed_runtime/master_session.cc:1150] Start master session b0a6d2587e64ebef with config: allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } }
-...
-2018-09-27T00:37:33.250440133Z I0927 00:37:33.250036 140281586534144 tf_logging.py:115] global_step/sec: 21.8175
-2018-09-27T00:37:33.253100942Z I0927 00:37:33.252873 140281586534144 tf_logging.py:115] loss = 0.09276967, step = 500 (4.583 sec)
-2018-09-27T00:37:37.764446795Z I0927 00:37:37.764101 140281586534144 tf_logging.py:115] Saving checkpoints for 600 into /data/mnist/models/model.ckpt.
-2018-09-27T00:37:38.064104604Z I0927 00:37:38.063472 140281586534144 tf_logging.py:115] Loss for final step: 0.24215397.
-```
-
-6\. Check the training status and tensorboard
-
-```
-# arena get tf-estimator
-NAME          STATUS     TRAINER  AGE  INSTANCE                        NODE
-tf-estimator  SUCCEEDED  TFJOB    5h   tf-estimator-tfjob-chief-0      N/A
-tf-estimator  RUNNING    TFJOB    5h   tf-estimator-tfjob-evaluator-0  192.168.1.120
-tf-estimator  RUNNING    TFJOB    5h   tf-estimator-tfjob-ps-0         192.168.1.119
-tf-estimator  RUNNING    TFJOB    5h   tf-estimator-tfjob-worker-0     192.168.1.118
-
-Your tensorboard will be available on:
-192.168.1.117:31366
-```
-
-7\. Check the tensorboard from 192.168.1.117:31366 in this sample
-
-![](8-tfjob-estimator-tensorboard.jpg)
--- a/archived/docs/userguide/9-top-job-gpu-metric.md
+++ b/archived/docs/userguide/9-top-job-gpu-metric.md
@ -1,65 +0,0 @@
-The command `arena top job <job name>` can display GPU monitoring metrics. Before using it, you must deploy a Prometheus and nodeExporter for GPU Metrics.
-
-1\. Deploy a Prometheus
-
-```
-kubectl apply -f kubernetes-artifacts/prometheus/prometheus.yaml
-```
-
-2\. Deploy GPU node exporter
-
-* If your cluster is ACK (Alibaba Cloud Kubernetes) cluster, you can just exec command:
-
-```
-# change gpu export nodeSelector to aliyun label
-sed -i 's|accelerator/nvidia_gpu|aliyun.accelerator/nvidia_count|g' kubernetes-artifacts/prometheus/gpu-expoter.yaml
-```
-
-* If your cluster is not ACK cluster, you need to label your GPU node:
-
-```
-# label all your GPU nodes
-kubectl label node <your GPU node> accelerator/nvidia_gpu=true
-```
-
-* Deploy gpu exporter
-
-```
-kubectl apply -f kubernetes-artifacts/prometheus/gpu-exporter.yaml
-```
-
-> Notice: the prometheus and gpu-exporter components should be deployed in namespace `kube-system`, and so that `arena top job <job name>` can work.
-
-3\. You can check the GPU metrics by prometheus SQL request
-
-```
-# kubectl get --raw '/api/v1/namespaces/arena-system/services/prometheus-svc:prometheus/proxy/api/v1/query?query=nvidia_gpu_num_devices'
-
-{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"nvidia_gpu_num_devices","app":"node-gpu-exporter","instance":"172.16.1.144:9445","job":"kubernetes-service-endpoints","k8s_app":"node-gpu-exporter","kubernetes_name":"node-gpu-exporter","node_name":"mynode"},"value":[1543202894.919,"2"]}]}}
-
-```
-
-4\. Submit a traing job by arena
-
-```
-arena submit tf --name=style-transfer              \
-              --gpus=2              \
-              --workers=2              \
-              --workerImage=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/neural-style:gpu \
-              --workingDir=/neural-style \
-              --ps=1              \
-              --psImage=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/style-transfer:ps   \
-              "python neural_style.py --styles /neural-style/examples/1-style.jpg --iterations 1000000"
-```
-
-5\. Check GPU metrics for the job you deployed
-
-```
-# arena top job style-transfer
-INSTANCE NAME                  STATUS   NODE          GPU(Device Index)  GPU(Duty Cycle)  GPU(Memory MiB)
-style-transfer-tfjob-ps-0      Running  192.168.0.95  N/A                N/A              N/A
-style-transfer-tfjob-worker-0  Running  192.168.0.98  0                  98%              15641MiB / 16276MiB
-                                                      1                  0%               15481MiB / 16276MiB
-style-transfer-tfjob-worker-1  Running  192.168.0.99  0                  98%              15641MiB / 16276MiB
-                                                      1                  0%               15481MiB / 16276MiB
-```
--- a/archived/docs/userguide_cn/1-tfjob-logviewer.jpg
+++ b/archived/docs/userguide_cn/1-tfjob-logviewer.jpg
--- a/archived/docs/userguide_cn/1-tfjob-standalone.md
+++ b/archived/docs/userguide_cn/1-tfjob-standalone.md
@ -1,139 +0,0 @@
-
-这个示例展示了如何使用 `Arena` 进行机器学习模型训练。该示例将从 git url 下载源代码。
-
-1. 第一步是检查可用的GPU资源
-
-```
-arena top node
-NAME IPADDRESS ROLE GPU(Total) GPU(Allocated)
-i-j6c68vrtpvj708d9x6j0 192.168.1.116 master 0 0
-i-j6c8ef8d9sqhsy950x7x 192.168.1.119 worker 1 0
-i-j6c8ef8d9sqhsy950x7y 192.168.1.120 worker 1 0
-i-j6c8ef8d9sqhsy950x7z 192.168.1.118 worker 1 0
-i-j6ccue91mx9n2qav7qsm 192.168.1.115 master 0 0
-i-j6ce09gzdig6cfcy1lwr 192.168.1.117 master 0 0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-0/3 (0%)
-```
-
-有 3 个包含 GPU 的可用节点用于运行训练作业。
-
-
-2\.现在，我们可以通过 `arena` 提交一个训练作业，本示例从 github 下载源代码
-
-```
-#arena submit tf \
-             --name=tf-git \
-             --gpus=1 \
-             --image=tensorflow/tensorflow:1.5.0-devel-gpu \
-             --sync-mode=git \
-             --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py --max_steps 10000 --data_dir=code/tensorflow-sample-code/data"
-configmap/tf-git-tfjob created
-configmap/tf-git-tfjob labeled
-tfjob.kubeflow.org/tf-git created
-INFO[0000] The Job tf-git has been submitted successfully
-INFO[0000] You can run `arena get tf-git --type tfjob` to check the job status
-```
-
-> 这会下载源代码，并将其解压缩到工作目录的 `code/` 目录。默认的工作目录是 `/root`，您也可以使用 `--workingDir` 加以指定。同时你也可以通过在提交的命令中通过增加 `--env GIT_SYNC_BRANCH=main` 的方式来声明想要拉取的分支。`注意Github现在新建的repo都会以main作为主分支而不是Mater。` 
-
-> 如果您正在使用非公开 git 代码库，则可以使用以下命令：
-
-```
-#arena submit tf \
-             --name=tf-git \
-             --gpus=1 \
-             --image=tensorflow/tensorflow:1.5.0-devel-gpu \
-             --sync-mode=git \
-             --sync-source=https://code.aliyun.com/xiaozhou/tensorflow-sample-code.git \
-             --env=GIT_SYNC_USERNAME=yourname \
-             --env=GIT_SYNC_PASSWORD=yourpwd \
-             "python code/tensorflow-sample-code/tfjob/docker/mnist/main.py"
-```
-
-注意：`arena` 使用 [git-sync](https://github.com/kubernetes/git-sync/blob/master/cmd/git-sync/main.go) 来同步源代码。您可以设置在 git-sync 项目中定义的环境变量。
-
-3\.列出所有作业
-
-```
-#arena list
-NAME STATUS TRAINER AGE NODE
-tf-git RUNNING tfjob 0s 192.168.1.120
-```
-
-4\.检查作业所使用的GPU资源
-
-```
-#arena top job
-NAME STATUS TRAINER AGE NODE GPU(Requests) GPU(Allocated)
-tf-git RUNNING TFJOB 17s 192.168.1.120 1 1
-
-
-Total Allocated GPUs of Training Job:
-1
-
-Total Requested GPUs of Training Job:
-1
-```
-
-5\.检查集群所使用的GPU资源
-
-```
-#arena top node
-NAME IPADDRESS ROLE GPU(Total) GPU(Allocated)
-i-j6c68vrtpvj708d9x6j0 192.168.1.116 master 0 0
-i-j6c8ef8d9sqhsy950x7x 192.168.1.119 worker 1 0
-i-j6c8ef8d9sqhsy950x7y 192.168.1.120 worker 1 1
-i-j6c8ef8d9sqhsy950x7z 192.168.1.118 worker 1 0
-i-j6ccue91mx9n2qav7qsm 192.168.1.115 master 0 0
-i-j6ce09gzdig6cfcy1lwr 192.168.1.117 master 0 0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-1/3 (33%)
-```
-
-
-6\.获取特定作业的详细信息
-
-```
-#arena get tf-git
-NAME STATUS TRAINER AGE INSTANCE NODE
-tf-git RUNNING TFJOB 5s tf-git-tfjob-worker-0 192.168.1.120
-```
-
-7\.检查日志
-
-```
-#arena logs tf-git
-2018-07-22T23:56:20.841129509Z WARNING:tensorflow:From code/tensorflow-sample-code/tfjob/docker/mnist/main.py:119: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
-2018-07-22T23:56:20.841211064Z Instructions for updating:
-2018-07-22T23:56:20.841217002Z
-2018-07-22T23:56:20.841221287Z Future major versions of TensorFlow will allow gradients to flow
-2018-07-22T23:56:20.841225581Z into the labels input on backprop by default.
-2018-07-22T23:56:20.841229492Z
-...
-2018-07-22T23:57:11.842929868Z Accuracy at step 920: 0.967
-2018-07-22T23:57:11.842933859Z Accuracy at step 930: 0.9646
-2018-07-22T23:57:11.842937832Z Accuracy at step 940: 0.967
-2018-07-22T23:57:11.842941362Z Accuracy at step 950: 0.9674
-2018-07-22T23:57:11.842945487Z Accuracy at step 960: 0.9693
-2018-07-22T23:57:11.842949067Z Accuracy at step 970: 0.9687
-2018-07-22T23:57:11.842952818Z Accuracy at step 980: 0.9688
-2018-07-22T23:57:11.842956775Z Accuracy at step 990: 0.9649
-2018-07-22T23:57:11.842961076Z Adding run metadata for 999
-```
-
-8\.日志查看器中有关训练作业的更多信息
-
-```
-#arena logviewer tf-git
-Your LogViewer will be available on:
-192.168.1.120:8080/tfjobs/ui/#/default/tf-git-tfjob
-```
-
-![](1-tfjob-logviewer.jpg)
-
-
-恭喜！您已经成功使用 `arena` 完成了第一项训练作业。 
--- a/archived/docs/userguide_cn/13-preempted-mpijob.md
+++ b/archived/docs/userguide_cn/13-preempted-mpijob.md
@ -1,168 +0,0 @@
-
-# Arena 支持MPIJob任务抢占的示例
-
-## 前提条件
-
- k8s > 1.11
-
-1.利用下列yaml创建`PriorityClass`对象，这里定义了两个优先级`critical`和`medium`:
-
-```yaml
-apiVersion: scheduling.k8s.io/v1beta1
-description: Used for the critical app
-kind: PriorityClass
-metadata:
-  name: critical
-value: 1100000
-
---
-
-apiVersion: scheduling.k8s.io/v1beta1
-description: Used for the medium app
-kind: PriorityClass
-metadata:
-  name: medium
-value: 1000000
-```
-
-将上述内容保存到`pc.yaml`文件，并且通过下列命令创建:
-
-```
-kubectl create -f pc.yaml
-```
-
-2.通过arena命令可以看到：在当前Kubernetes集群中只有一张可用GPU卡:
-
-```
-# arena top node
-NAME          IPADDRESS     ROLE    GPU(Total)  GPU(Allocated)
-192.168.0.20  192.168.0.20  master  0           0
-192.168.0.21  192.168.0.21  master  0           0
-192.168.0.22  192.168.0.22  master  0           0
-192.168.0.23  192.168.0.23  <none>  1           0
-----------------------------------------------------------------------------------------
-Allocated/Total GPUs In Cluster:
-0/1 (0%)
-```
-
-3.提交一个MPI训练任务，该任务的优先级为`medium`:
-
-参考如下例子 
-
-```
-# arena submit mpi          \
-    --name=medium           \
-    --priority=medium       \
-    --gpus=1                \
-    --workers=1             \
-    --image=registry.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5               \
-    "mpirun tail -f /dev/null"
-configmap/medium-mpijob created
-configmap/medium-mpijob labeled
-mpijob.kubeflow.org/medium created
-INFO[0000] The Job medium has been submitted successfully
-INFO[0000] You can run `arena get medium --type mpijob` to check the job status
-```
-
-4.查看该任务的运行状态
-
-```
-# arena get medium
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: medium
-TRAINING DURATION: 58s
-
-NAME    STATUS   TRAINER  AGE  INSTANCE               NODE
-medium  RUNNING  MPIJOB   58s  medium-launcher-sz5xj  192.168.0.23
-medium  RUNNING  MPIJOB   58s  medium-worker-0        192.168.0.23
-```
-
-5.可以看到该任务占用了唯一的一张GPU卡
-
-```
-# arena top node -d
-
-NAME:       cn-hangzhou.192.168.0.23
-IPADDRESS:  192.168.0.23
-ROLE:       <none>
-
-NAMESPACE  NAME             GPU REQUESTS  GPU LIMITS
-default    medium-worker-0  1             1
-
-Total GPUs In Node cn-hangzhou.192.168.0.23:      1
-Allocated GPUs In Node cn-hangzhou.192.168.0.23:  1 (100%)
-----------------------------------------------------------------------------------------
-
-Allocated/Total GPUs In Cluster:  1/1 (100%)
-```
-
-6.再提交一个MPI训练任务，该任务的优先级为`critical`:
-
-```
-# arena submit mpi          \
-    --name=critical           \
-    --priority=critical       \
-    --gpus=1                \
-    --workers=1             \
-    --image=registry.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5               \
-    "mpirun tail -f /dev/null"
-```
-
-7.检查MPI训练任务`medium`的相关事件，可以发现它被驱逐了。而它被驱逐的原因是由于被更重要的任务`critical`下的Pod也在申请GPU资源，而集群内只有一个可用的GPU资源，所以较低优先级的任务`medium`的`medium-worker-0`被驱逐
-
-```
-# kubectl get events --field-selector involvedObject.name=medium-worker-0
-LAST SEEN   TYPE     REASON      OBJECT                MESSAGE
-15m         Normal   Scheduled   pod/medium-worker-0   Successfully assigned default/medium-worker-0 to 192.168.0.23
-14m         Normal   Pulled      pod/medium-worker-0   Container image "registry.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5" already present on machine
-14m         Normal   Created     pod/medium-worker-0   Created container mpi
-14m         Normal   Started     pod/medium-worker-0   Started container mpi
-2m32s       Normal   Preempted   pod/medium-worker-0   by default/critical-worker-0 on node 192.168.0.23
-2m32s       Normal   Killing     pod/medium-worker-0   Stopping container mpi
-```
-
-8.查看MPI训练任务`medium`的细节信息，发现这个任务已经处于失败状态。
-
-```
-# arena get medium
-STATUS: FAILED
-NAMESPACE: default
-PRIORITY: medium
-TRAINING DURATION: 12m
-
-NAME    STATUS  TRAINER  AGE  INSTANCE               NODE
-medium  FAILED  MPIJOB   20m  medium-launcher-sz5xj  192.168.0.23
-```
-
-9.查看MPI训练任务`critical`的细节信息，发现这个任务已经处于运行状态。
-
-```
-# arena get critical
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: critical
-TRAINING DURATION: 10m
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-critical  RUNNING  MPIJOB   10m  critical-launcher-mfffs  192.168.0.23
-critical  RUNNING  MPIJOB   10m  critical-worker-0        192.168.0.23
-```
-
-10.而且也可以通过`arena top node -d`发现这个GPU已经被MPI训练任务`critical`占用。
-
-```
-# arena top node -d
-NAME:       cn-hangzhou.192.168.0.23
-IPADDRESS:  192.168.0.23
-ROLE:       <none>
-
-NAMESPACE  NAME               GPU REQUESTS  GPU LIMITS
-default    critical-worker-0  1             1
-
-Total GPUs In Node cn-hangzhou.192.168.0.23:      1
-Allocated GPUs In Node cn-hangzhou.192.168.0.23:  1 (100%)
-----------------------------------------------------------------------------------------
-```
-
-恭喜! 你已经可以通过arena实现对于MPIJob优先级抢占。
--- a/archived/docs/userguide_cn/14-submit-with-node-selector.md
+++ b/archived/docs/userguide_cn/14-submit-with-node-selector.md
@ -1,159 +0,0 @@
-
-Arena支持给提交的任务指定运行的节点（目前仅支持mpi和tf类型的任务）。
-
-下面展示一些使用例子。
-
-1.查询k8s集群信息：
-``` 
-# kubectl get nodes
-NAME                       STATUS   ROLES    AGE     VERSION
-cn-beijing.192.168.3.225   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.226   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.227   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.228   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.229   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.230   Ready    <none>   2d22h   v1.12.6-aliyun.1
-```
-2.为一些k8s节点指定标签。例如，为节点"cn-beijing.192.168.3.228"和节点"cn-beijing.192.168.3.229"指定标签"gpu_node=ok"，为节点"cn-beijing.192.168.3.230"指定标签"ssd_node=ok"。
-```
-# kubectl label nodes cn-beijing.192.168.3.228 gpu_node=ok
-node/cn-beijing.192.168.3.228 labeled
-# kubectl label nodes cn-beijing.192.168.3.229 gpu_node=ok
-node/cn-beijing.192.168.3.229 labeled
-# kubectl label nodes cn-beijing.192.168.3.230 ssd_node=ok
-node/cn-beijing.192.168.3.230 labeled
-``` 
-## MPI类型的job
-1.当提交一些job时，可以通过"--selector"选项来确定这些job运行在哪些节点上
-```
-# arena submit mpi --name=mpi-dist  \
-              --gpus=1              \
-              --workers=1              \
-	      --selector gpu_node=ok \
-              --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-              --tensorboard \
-              --loglevel debug \
-              "mpirun python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64     --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-```
-2.查询job信息
-```
-# arena get mpi-dist                                                                                                                                  
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 21s
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-mpi-dist  RUNNING  MPIJOB   21s  mpi-dist-launcher-7jn4q  192.168.3.229
-mpi-dist  RUNNING  MPIJOB   21s  mpi-dist-worker-0        192.168.3.229
-
-Your tensorboard will be available on:
-http://192.168.3.225:31611
-```
-可以看到job已经运行在节点cn-beijing.192.168.3.228(ip是192.168.3.229)上了。
-the jobs have been running  on node cn-beijing.192.168.3.228(ip is 192.168.3.229).
-
-3.你可以多次使用"--selector"选项，例如：你可以在arena的提交命令中使用"--selector gpu_node=ok --selector ssd_node=ok",这代表你需要将job运行在那些同时拥有标签"gpu_node=ok"和标签"ssd_node=ok"的节点上
-
-## TF类型的job
- 
-1.因为在tf类型的job当中，存在四种角色（"PS","Worker","Evaluator","Chief"），你可以使用"--selector"来指定job运行在哪些节点上。
-```
-arena submit tfjob \
-      --name=tf \
-      --gpus=1              \
-      --workers=1              \
-      --selector ssd_node=ok \
-      --work-image=cheyang/tf-mnist-distributed:gpu \
-      --ps-image=cheyang/tf-mnist-distributed:cpu \
-      --ps=1              \
-      --tensorboard \
-      --loglevel debug \
-      "python /app/main.py"
-```
-使用如下命令检查节点状态：
-
-```
-# arena get tf                                                                                                                                       
-STATUS: PENDING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 24s
-
-NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
-tf    RUNNING  TFJOB    24s  tf-ps-0      192.168.3.230
-tf    PENDING  TFJOB    24s  tf-worker-0  192.168.3.230
-
-Your tensorboard will be available on:
-http://192.168.3.225:31867
-```
-
-可以看到"PS"类型的job和"Worker"类型的job都运行在了节点cn-beijing.192.168.3.230(ip是192.168.3.230,标签是"ssd_node=ok")上了。
-the jobs(include "PS" and "Worker") have been running on cn-beijing.192.168.3.230(ip is 192.168.3.230,label is "ssd_node=ok").
-
-2.你也可以单独指定一种角色的job运行在哪些节点上，例如：如果你希望把"PS" job运行在标签为ssd_node="ok"节点上，把"Worker" job运行在标签为"gpu_node=ok"的节点上，可以使用"--ps-selector"和"--worker-selector"。
-
-```
-arena submit tfjob \
-      --name=tf \
-      --gpus=1              \
-      --workers=1              \
-      --ps-selector ssd_node=ok \
-      --worker-selector gpu_node=ok \
-      --work-image=cheyang/tf-mnist-distributed:gpu \
-      --ps-image=cheyang/tf-mnist-distributed:cpu \
-      --ps=1              \
-      --tensorboard \
-      --loglevel debug \
-      "python /app/main.py"
-```
-检查job的状态:
-
-```
-# arena get tf                                                                                                                                       
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 23s
-
-NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
-tf    RUNNING  TFJOB    23s  tf-ps-0      192.168.3.230
-tf    RUNNING  TFJOB    23s  tf-worker-0  192.168.3.228
-
-Your tensorboard will be available on:
-http://192.168.3.225:30162
-```
-"PS" job运行在节点cn-beijing.192.168.3.230(ip是192.168.3.230,标签是"ssd_node=ok")，"Worker" job运行在节点cn-beijing.192.168.3.228(ip是192.168.3.228,标签是"gpu_node=ok")上。
-
-3.如果你同时使用"--selector"和"--ps-selector"（或者"--worker-selector","--evaluator-selector","chief-selector"），那么"--ps-selector"的值会覆盖"--selector"的值。，例如：
-
-```
-arena submit tfjob \
-      --name=tf \
-      --gpus=1              \
-      --workers=1              \
-      --ps-selector ssd_node=ok \
-      --selector gpu_node=ok \
-      --work-image=cheyang/tf-mnist-distributed:gpu \
-      --ps-image=cheyang/tf-mnist-distributed:cpu \
-      --ps=1              \
-      --tensorboard \
-      --loglevel debug \
-      "python /app/main.py"
-```
-理论上"--selector"会应用到所有角色的job中，在上面的命令中，所有角色的job将会被调度到标签为gpu_node=ok的节点上，但是因为有"--ps-selector"，那么"PS" job会被调度到标签为ssd_node=ok上，而不是标签为gpu_node=ok的节点上。
-```
-# arena get tf                                                                                                                                       
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 39s
-
-NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
-tf    RUNNING  TFJOB    39s  tf-ps-0      192.168.3.230
-tf    RUNNING  TFJOB    39s  tf-worker-0  192.168.3.228
-
-Your tensorboard will be available on:
-http://192.168.3.225:32105
-```
-正如你所看到的，"PS" job被调度到拥有标签为"ssd_node=ok"的节点上，其他节点被调度到标签为"gpu_node=ok"的节点上。
--- a/archived/docs/userguide_cn/14-submit-with-node-toleration.md
+++ b/archived/docs/userguide_cn/14-submit-with-node-toleration.md
@ -1,83 +0,0 @@
-
-Arena支持将提交的job运行在k8s污点上（目前仅支持mpi和tf类型的 job）
-
-下面展示一些使用例子。
-
-1.查询k8s集群信息：
-```
-# kubectl get nodes
-NAME                       STATUS   ROLES    AGE     VERSION
-cn-beijing.192.168.3.225   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.226   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.227   Ready    master   2d23h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.228   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.229   Ready    <none>   2d22h   v1.12.6-aliyun.1
-cn-beijing.192.168.3.230   Ready    <none>   2d22h   v1.12.6-aliyun.1
-```
-2.为k8s节点打上一些污点，例如：为节点"cn-beijing.192.168.3.228"和节点"cn-beijing.192.168.3.229"打上污点"gpu_node=invalid:NoSchedule"，为节点"cn-beijing.192.168.3.230"打上污点"ssd_node=invalid:NoSchedule"。现在，所有pod都不能调度到这些节点了。
-```
-# kubectl taint nodes cn-beijing.192.168.3.228 gpu_node=invalid:NoSchedule                                                                            
-node/cn-beijing.192.168.3.228 tainted
-# kubectl taint nodes cn-beijing.192.168.3.229 gpu_node=invalid:NoSchedule                                                                            
-node/cn-beijing.192.168.3.229 tainted
-# kubectl taint nodes cn-beijing.192.168.3.230 ssd_node=invalid:NoSchedule                                                                            
-node/cn-beijing.192.168.3.230 tainted
-``` 
-3.当提交一个job时，你可以使用"--toleration"来容忍一些带有污点的k8s节点。
-```
-# arena submit mpi --name=mpi-dist  \
-              --gpus=1              \
-              --workers=1              \
-	      --toleration ssd_node \
-              --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-              --tensorboard \
-              --loglevel debug \
-              "mpirun python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64     --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-```
-查询job信息：
-```
-# arena get mpi-dist                                                                                                                                 
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 29s
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-launcher-jgms7  192.168.3.230
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-worker-0        192.168.3.230
-
-Your tensorboard will be available on:
-http://192.168.3.225:30052
-```
-job已经运行在节点cn-beijing.192.168.3.230(ip为192.168.3.230,污点为ssd_node=invalid)上了。
-
-4.你可以在同一个命令中多次使用"--toleration"。例如，你可以在命令中使用"--toleration gpu_node --toleration ssd_node"，它代表既可以容忍有污点"gpu_node"的节点，又可以容忍污点"ssd_node"的节点。
-
-```
-# arena submit mpi --name=mpi-dist  \
-              --gpus=1              \
-              --workers=1              \
-              --toleration ssd_node \
-              --toleration gpu_node \
-              --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod:0.13.11-tf1.10.0-torch0.4.0-py3.5 \
-              --tensorboard \
-              --loglevel debug \
-              "mpirun python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64     --variable_update horovod --train_dir=/training_logs --summary_verbosity=3 --save_summaries_steps=10"
-```
-查询job状态：
-
-```
-# arena get mpi-dist
-STATUS: RUNNING
-NAMESPACE: default
-PRIORITY: N/A
-TRAINING DURATION: 29s
-
-NAME      STATUS   TRAINER  AGE  INSTANCE                 NODE
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-launcher-jgms7  192.168.3.229
-mpi-dist  RUNNING  MPIJOB   29s  mpi-dist-worker-0        192.168.3.230
-
-Your tensorboard will be available on:
-http://192.168.3.225:30052
-```
-5.你可以使用"--toleration all"来容忍所有节点上的所有污点。
--- a/archived/docs/userguide_cn/15-custom-serving-sample-beijing.jpg
+++ b/archived/docs/userguide_cn/15-custom-serving-sample-beijing.jpg
--- a/archived/docs/userguide_cn/15-custom-serving-sample-beijing_out.jpg
+++ b/archived/docs/userguide_cn/15-custom-serving-sample-beijing_out.jpg
--- a/archived/docs/userguide_cn/15-custom-serving-sample.md
+++ b/archived/docs/userguide_cn/15-custom-serving-sample.md
@ -1,75 +0,0 @@
-# 用arena服务训练模型
-
-你可以适用arena部署你的训练模型，通过RESTful API的方式访问。为了说明怎样使用，我们将会使用一个案例[fast-style-transfer](https://github.com/floydhub/fast-style-transfer)，同时为了节约时间，直接使用这个项目已经训练好的模型并且把模型加入docker镜像中。
-
-### 1.部署训练模型
-
-使用项目中的app.py这个脚本启动一个restful服务器，你可以使用如下的命令去部署模型：
-
-```
-# arena serve custom \
-	--name=fast-style-transfer \
-	--gpus=1 \
-        --version=alpha \
-	--replicas=1 \
-	--restful-port=5000 \
-	--image=happy365/fast-style-transfer:latest \
-	"python app.py"
-```
-检查TensorFlow Serving Job的状态：
-
-```
-# arena serve list
-NAME                 TYPE    VERSION  DESIRED  AVAILABLE  ENDPOINT_ADDRESS  PORTS
-fast-style-transfer  CUSTOM  alpha    1        0          172.21.8.94       grpc:8001,restful:5000
-```
-因为docker镜像比较大，拉取它需要一定的时间，我们可以使用"kubectl"检查pod运行情况:
-
-```
-# kubectl get po
-NAME                                                        READY   STATUS              RESTARTS   AGE
-fast-style-transfer-alpha-custom-serving-845ffbf7dd-btbhj   0/1     ContainerCreating   0          6m44s
-```
-
-### 2.访问服务 
-
-我们可以使用一个带有curl命令的容器作为客户端去访问刚才创建的服务，但是首先我们需要创建这个客户端：
-```
-# kubectl run  sample-client \
-	--generator=run-pod/v1 \
-	--image=happy365/arena-serve-custem-sample-client:latest \
-	--command -- \
-	/bin/sleep infinity
-```
-然后，可以查询客户端的状态：
-```
-# kubectl get po  sample-client
-NAME            READY   STATUS    RESTARTS   AGE
-sample-client   1/1     Running   0          87s 
-
-```
-在用客户端访问custom service之前，我们需要查询服务名称，它是一个任务名和版本的结合（本例中，任务名为fast-style-transfer，版本为alpha)：
-
-```
-# kubectl get svc fast-style-transfer-alpha
-NAME                        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
-fast-style-transfer-alpha   ClusterIP   172.21.1.114   <none>        5000/TCP   31m
-```
-现在我们可以可以使用kubectl exec 进入容器当中：
-
-```
-# kubectl exec -ti sample-client /bin/sh
-#
-```
-接着在容器当中使用curl命令去访问aren创建的自定义服务:
-```
-# curl -o /root/output/beijing_out.jpg  -F "file=@/root/input/beijing.jpg" http://fast-style-transfer-alpha:5000
-```
-在上面的命令中，输入文件的名称为"beijing.jpg" ![beijing.jpg](15-custom-serving-sample-beijing.jpg)，存放的路径为"/root/input"，输出文件的路径为"/root/output/beijing_out.jpg"，现在需要退出容器然后在master节点上执行kubectl cp命令将结果从容器中拷贝出来：
-```
-# kubectl cp sample-client:/root/output/beijing_out.jpg ~/beijing_out.jpg
-```
-图片"beijing_out.jpg" ![beijing_out.jpg](15-custom-serving-sample-beijing_out.jpg)将会复制到当前用户的家目录下面。
-
-
-
--- a/Show More
+++ b/Show More
 @ -1 +1 @@
 .10.1
 .15.1