Commit Graph

23 Commits

Author SHA1 Message Date
Chen Sun 90ed23d183
chore: exclude `sdk-` tags when preparing base deployment for upgrade test (#9477)
* exclude `sdk-` tags when preparing base deployment for upgrade test

* Update deploy-pipeline-lite.sh

* Update deploy-pipeline-lite.sh

* Update upgrade-tests.sh
2023-05-23 00:08:35 +00:00
Yuan Gong 697c041c36 test: temporarily skip verify workload identity binding 2021-07-15 05:14:07 +00:00
Alexey Volkov cc83e1089b
Assigned copyright to the project authors (#5587) 2021-05-05 13:53:22 +08:00
Yuan (Bob) Gong 45a91f6699
feat(deployment): GCP managed storage - detailed instructions to set up workload identity bindings before deployment (#4232)
* feat: allow creating workload identity bindings before deploying KFP

* more instructions

* fix formatting

* fixes

* Update doc ref

* fix storage role

* fix storage role

* add viewer KSA to standalone manifest

* fix missing configmap

* update documentation
2020-07-16 23:13:00 -07:00
dushyanthsc bc3c59aea1
MetadataStore: Update to release metadata-envoy in each release (#4026) 2020-06-23 19:07:17 -07:00
Yuan (Bob) Gong 39805acc9e
[Manifest] Use kustomize native image transformer to override image (#3776)
* [Manifest] Use kustomize native image transformer to override image

* Revert unintended changes

* Fix kustomization.yaml location

* Fix inverse proxy image
2020-05-18 21:23:36 -07:00
Renmin db8042a846
Fix test which uses Kustomize edit image but can't work with valueRef (#3572)
pass upgrade / installation test. submitting. now.

The e2e test fails but not due to this PR. Submit this PR to unlock KIR side
2020-04-21 20:10:41 +08:00
Yuan (Bob) Gong 9067184815
Fix concurrent IAM policy changes flakiness (#3504) 2020-04-14 00:49:10 -07:00
Yuan (Bob) Gong 2500812914
[Testing] Reduce image build flakiness by share and retry cloudbuild jobs (#3492)
* Let presubmit tests share and retry cloudbuild

* Fix ongoing_build_ids

* Add retry for workload identity binding

* Fix build id

* fix

* Parralelize image buidling for api server and others

* Fix

* fix

* fix

* Fix again

* Allow retry twice instead

* Update deploy-pipeline-lite.sh

* Update batch_build.yaml

* Refine log and retry tests

* Update log and retry

* Update and retry

* Update build-images.sh
2020-04-13 20:33:11 -07:00
Jiaxiao Zheng bcb16ef62d
[Test] fix upgrade test (#3469)
* update deploy-pipeline-lite.sh

* fix

* fix?

* revert
2020-04-07 23:11:44 -07:00
Rui Fang 85257a06ea
[Manifest] Cache - MKP deployment (#3430)
* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Add cache manifests for mkp deployment

* revert go.sum

* Add helm on delete policy for cache deployer job

* Change cache deployer job to statefulset

* remove unnecessary cluster role

* seperate clusterrole and role

* add role and rolebinding to mkp

* change secret role to clusterrole

* Add cloudsql support to cache

* fix comma

* Change cache secret clusterrole to role

* Adjust sequences of resources

* Update values and schema

* remove extra tab

* Change statefulset to job

* Add pod delete permission to cache deployer role

* Test changing cache deployer job to deployment

* remove extra permission

* remove statefulset check
2020-04-06 16:53:19 -07:00
Rui Fang 8e137a1ba6
[Manifest] Cache - Enable cache and cache deployer in base kustomization file (#3376)
* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Change cache deployer job to stateful set

* Delete cache deployer job

* Delete cache deployer job after it completes

* minor fix

* fix indention

* Change cache deployer job to statefulset

* Remove extra cluster role for cache deployer

* remove cache in base kustomize file for upgrade test

* minor fix

* Enable cache and cache-deployer in base kustomization file

* fix

* fix

* test

* test

* test

* Refactor cluster scope resources

* refactor

* Add namespace for sa

* Fix

* Add crds folder to cluster kustomization yaml

* namespace change

* fix

* fix

* fix

* update test

* Rename cluster to cluster-scoped-resource

* test adding namespace in kustomization file

* revert namespace for clusterrolebinding

* fix

* Add db_name in cache_deployment manifest

* rename

* change secret cluster role to role
2020-04-02 14:37:04 -07:00
Yuan (Bob) Gong 5391e88fbc
[Testing] KFP standalone test infra for upgradability (#1971)
* Implement upgrade test

* mark upgrade-tests.sh as executable

* Fix comments

* Base upgrade_test_setup.yaml

* e2e integration of upgrade test

* Fix entrypoint argument

* Fix e2e workflow yaml

* Fix run_test.sh argument processing

* Fix no closing backtick

* Restrucutre upgrade_test.go to focus the test on upgrade verification

* clean up code

* Clean up after upgrade test when it is run in integration tests.

* Include pipeline tests in upgrade test

* Reorder tests

* Add upgrade test coverage for run api resources

* Add job api resource coverage in upgrade test & refactored upgrade test

* Fix add missing step in upgrade test

* Fix BUILD.bazel

* Fix upgrade_test.go

* Try to fix upgrade test failure

* Fix hard coded namespace

* Sync upgrade-tests.sh with new changes in presubmit-tests-with-pipeline-deployment.sh

* Update upgrade test

* Remove redundant code

* Fix integration test exit code

* Fix trigger interval second mismatch
2020-03-09 16:53:37 -07:00
Rui Fang ccdb885519
[Backend]Initial execution cache (#3036)
* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* Add initial server logic

* Add const

* Change folder name

* Change execution key name

* Fix unit test

* Add Dockerfile and OWNERS file

This commit adds Dockerfile for building source code and OWNERS file for
easy review. This commit also renames some functions.

* fix go.sum

This PR fixes changes on go.sum

* Add local deployment scripts

This commit adds local deployment scripts which can deploy cache service
to an existing cluster with KFP installed.

* refactor src code

* Add standalone deployment scripts and yamls

This commit adds execution cache deployment scripts and yaml files in
KFP standalone deployment. Including a deployer which will generate the
certification and mutatingwebhookconfiguration and execution cache
deployment.

* Minor fix

* Add execution cache image build in test folder

* fix test cloudbuild

* Fix cloudbuild

* Add execution cache deployer image to test folder

* Add copyright

* Fix deployer build

* Add license for execution cache and cloudbuild for execution cache
images

This commit adds licenses for execution cache source code. Also adds
cloud build step for building cache image and cache deployer image.
Change the manifest name based on changed image.

* Refactor license intermediate data

* Fix execution cache image manifest

* Typo fix for cache and cache deployer images

* Add arguments in ca generation scripts and change deployer base image to google/cloud

* minor fix

* fix arg

* Mirror source code with MPL in execution_cache image

* Minor fix

* minor refactor on error handling

* Refactor cache source code, Docker image and manifest

* Fix variable names

* Add images in .release.cloudbuild.yaml

* Change execution_cache to generic name

* revice readme

* Move deployer job out of upgrade script

* fix tests

* fix tests

* Seperate cache service and cache deployer job

* mysql set up

* Delete cache service in manifest, only test in presubmit tests

* fix

* fix presubmit tests

* fix

* fix

* revert unnecessary change

* fix cache image tag

* change image gcr to ml-pipeline-test

* Remove namespace in standalone manifest and add to test manifest
2020-03-03 16:13:47 -08:00
Yuan (Bob) Gong 30b79255bd
[Testing] Reduce flakiness caused by iam bindings (#3008)
* Add retry to iam policy bindings

* Add retry for iam policy changes to reduce flakiness
2020-02-07 00:05:43 -08:00
IronPan a0a39a5eda Install application CRD and add pipeline application CR to pipeline standalone (#2585)
* install application CRD and add pipeline application CR

* add labels and let application manager to set ownerref

* fix

* address comments

* update test

* update test

* update test

* update readme

* fix test

* update

* update

* update

* Update application-crd.yaml

* fix

* fix

* Update .release.cloudbuild.yaml

* update tests

* Update kustomization.yaml

* Update deploy-pipeline-lite.sh

* Update ml-pipeline-viewer-crd-sa.yaml

* update tests

* update tests

* update tests
2020-01-16 09:20:25 -08:00
Alexey Volkov dc34a3568d Service - Metadata writer (#2674)
* Metadata writer

* Added sleeper-based metadata writer

* Sleeper

* First working draft

* Added properties to Executions Artifacts and Contexts

Also added attributions.
context_id is now stored as label.

* Prefix the execution type names

* Ignoring TFX pods

* Fixed the deployment container spec

* Cleaned up the file and added deployment spec

* Added the Kubernetes deployment

* Added startup logging

* Made python output unbuffered

* Fixed None exception

* Formatting exceptions

* Prefixing the log message

* Improved handling non-S3 artifacts

* Logging input artifacts

* Extracted code to the link_execution_to_input_artifact function

* Setting execution's pipeline_name to workflow name

* Adding annotation with input artifact IDs

* Running infinitely

* Added component version to execution type name

* Marking metadata as written even for failed pods

* Cleaned up some comments

* Do not fail when upstream artifact is missing

* Change the completion detection logic

Waiting for Argo's "completed=true" instead of Kubernetes' "phase: Completed" introduced delays that lead to problems with missing input artifacts.
This changes allows us to log the outpuyt artifacts earlier.

* Added Dockerfile

* Added release deployment manifest

* Added OWNERS

* Switching to using MLMD service instead of direct DB access

* Adding licenses to the image

* Pinned Python's minor version

* Moved code to /backend/metadata_writer

Moved manifest to /manifests

* Added image building to CloudBuild

* Added Metadata Writer to release CloudBuild

* Added Metadata Writer to test scripts

* Finished the kustomization manifests

* Added Metadata Writer to marketplace manifests

* Added ServiceAccount, Role and RoleBinding for MW

* Fixed merge conflict

* Removed the debug deployment

* Forgot to add the chart templates for the SA and roles

* Specified the service account

* Switched to watching a single namespace

* Resolved feedback

Removed dev deployment comment from python code.
Added license.
Fixed the range of kubernetes package versions.

* More review fixes

* Extracted the metadata helper functions

* Improved the error message when context type is unexpected

* Fixed the import

* Checking the connection to MLMD

The latest tests started to have connection problems - "failed to connect to all addresses" and "Failed to pick subchannel".

* Improved the MLMD connection error logging

* Try creating MLMD client on each retry and using a different request

* Changed the MLMD connection check request

All get requests fail when the DB is empty, so we have to use a put request.
See https://github.com/google/ml-metadata/issues/28

* Using unbuffered IO to improve the logging latency

* Changed the URI schema for the artifacts

* Cleanup

* Simplified the kubernetes config loading code

* Resolving the feedback
2020-01-14 23:17:32 -08:00
Yuan (Bob) Gong 4a8d262abb Migrate standalone deployment to workload identity on GCP (#2619)
* Script to set up workload identity for standalone deployment

* Migrate tests to run on standalone + workload identity

* Fix test script

* Switch to static GSAs for testing, because they have name length limit

* Add workload identity binding for argo

* Fix argo workload identity bindings

* Remove user-gcp-sa from tests

* Remove use_gcp_secret from xgboost sample

* Allow debugging tests locally

* Wait for policies to take effect

* Update deploy-pipeline-lite.sh

* Update deploy-pipeline-lite.sh

* [WIP] test gcloud auth list with test-runner sa

* Add namespace

* test again

* Use new image builder

* test again

* Remove debug code

* Remove usages of use_gcp_secret

* Fix unit test and tensorboard pod template

* Add debug code again to test

* Try waiting until workload identity bindings are ready

* Fix some other samples

* Fix parameterized tfx oss sample

* Add retry to image building

* Try fixing tfx oss sample

* Fix compiled tfx oss sample

* Update all google/cloud-sdk to latest

* Try fixing parameterized tfx oss sample again

* Also verify pipeline-runner ksa is working

* Fix parameterized_tfx_oss sample

* Update gcp-workload-identity-setup.sh

* Revert unneeded change

* Pin to new google/cloud-sdk

* Remove wrongly commited binaries
2019-12-16 22:05:58 -08:00
IronPan 0711566754 Build inverse proxy image as part of the presubmit test (#2187)
* small fixes

* add

* Delete Makefile
2019-09-21 16:53:23 -07:00
Alexey Volkov 570141d7c9 Make wget quieter (#2069) 2019-09-09 14:32:54 -07:00
Kirin Patel 41b394b045 Add e2e visualization tests (#1981)
* Created visualization_api_test.go

* Updated BUILD.bazel files

* Removed clean_up from e2e test

* Revert "Removed clean_up from e2e test"

This reverts commit 82fd4f5a00.

* Update e2e tests to build visualizationserver and viewer-crd

* Fix bug where wrong image is set

* Fixed incorrect image names

* Fixed additional instance of incorrect image names
2019-08-30 13:54:10 -07:00
Yuan (Bob) Gong 8e53eb43ad Move postsubmit tests to lite deployment (#1939)
* Move postsubmit tests to lite deployment

* Reduce verbose logs by wget

* Add ignored files

* add test temporary file to gitignore
2019-08-23 14:34:26 -07:00
Yuan (Bob) Gong d11fae78d8 Use KFP lite deployment for presubmit tests (#1808)
* Refactor presubmit-tests-with-pipeline-deployment.sh so that it can be run from a different project

* Simplify getting service account from cluster.

* Migrate presubmit-tests-with-pipeline-deployment.sh to use kfp
lightweight deployment.

* Add option to cache built images to make debugging faster.

* Fix cluster set up

* Copy image builder image instead of granting permission

* Add missed yes command

* fix stuff

* Let other usages of image-builder image become configurable

* let test workflow use image builder image

* Fix permission issue

* Hide irrelevant error logs

* Use shared service account key instead

* Move test manifest to test folder

* Move build-images.sh to a different script file

* Update README.md

* add cluster info dump

* Use the same cluster resources as kubeflow deployment

* Remove cluster info dump

* Add timing to test log

* cleaned up code

* fix tests

* address cr comments

* Address cr comments

* Enable image caching to improve retest speed
2019-08-20 17:25:20 -07:00