pipelines

Commit Graph

Author	SHA1	Message	Date
Chen Sun	90ed23d183	chore: exclude `sdk-` tags when preparing base deployment for upgrade test (#9477 ) * exclude `sdk-` tags when preparing base deployment for upgrade test * Update deploy-pipeline-lite.sh * Update deploy-pipeline-lite.sh * Update upgrade-tests.sh	2023-05-23 00:08:35 +00:00
Yuan Gong	697c041c36	test: temporarily skip verify workload identity binding	2021-07-15 05:14:07 +00:00
Alexey Volkov	cc83e1089b	Assigned copyright to the project authors (#5587 )	2021-05-05 13:53:22 +08:00
Yuan (Bob) Gong	45a91f6699	feat(deployment): GCP managed storage - detailed instructions to set up workload identity bindings before deployment (#4232 ) * feat: allow creating workload identity bindings before deploying KFP * more instructions * fix formatting * fixes * Update doc ref * fix storage role * fix storage role * add viewer KSA to standalone manifest * fix missing configmap * update documentation	2020-07-16 23:13:00 -07:00
dushyanthsc	bc3c59aea1	MetadataStore: Update to release metadata-envoy in each release (#4026 )	2020-06-23 19:07:17 -07:00
Yuan (Bob) Gong	39805acc9e	[Manifest] Use kustomize native image transformer to override image (#3776 ) * [Manifest] Use kustomize native image transformer to override image * Revert unintended changes * Fix kustomization.yaml location * Fix inverse proxy image	2020-05-18 21:23:36 -07:00
Renmin	db8042a846	Fix test which uses Kustomize edit image but can't work with valueRef (#3572 ) pass upgrade / installation test. submitting. now. The e2e test fails but not due to this PR. Submit this PR to unlock KIR side	2020-04-21 20:10:41 +08:00
Yuan (Bob) Gong	9067184815	Fix concurrent IAM policy changes flakiness (#3504 )	2020-04-14 00:49:10 -07:00
Yuan (Bob) Gong	2500812914	[Testing] Reduce image build flakiness by share and retry cloudbuild jobs (#3492 ) * Let presubmit tests share and retry cloudbuild * Fix ongoing_build_ids * Add retry for workload identity binding * Fix build id * fix * Parralelize image buidling for api server and others * Fix * fix * fix * Fix again * Allow retry twice instead * Update deploy-pipeline-lite.sh * Update batch_build.yaml * Refine log and retry tests * Update log and retry * Update and retry * Update build-images.sh	2020-04-13 20:33:11 -07:00
Jiaxiao Zheng	bcb16ef62d	[Test] fix upgrade test (#3469 ) * update deploy-pipeline-lite.sh * fix * fix? * revert	2020-04-07 23:11:44 -07:00
Rui Fang	85257a06ea	[Manifest] Cache - MKP deployment (#3430 ) * Initial execution cache This commit adds initial execution cache service. Including http service and execution key generation. * fix master * Add cache manifests for mkp deployment * revert go.sum * Add helm on delete policy for cache deployer job * Change cache deployer job to statefulset * remove unnecessary cluster role * seperate clusterrole and role * add role and rolebinding to mkp * change secret role to clusterrole * Add cloudsql support to cache * fix comma * Change cache secret clusterrole to role * Adjust sequences of resources * Update values and schema * remove extra tab * Change statefulset to job * Add pod delete permission to cache deployer role * Test changing cache deployer job to deployment * remove extra permission * remove statefulset check	2020-04-06 16:53:19 -07:00
Rui Fang	8e137a1ba6	[Manifest] Cache - Enable cache and cache deployer in base kustomization file (#3376 ) * Initial execution cache This commit adds initial execution cache service. Including http service and execution key generation. * fix master * Change cache deployer job to stateful set * Delete cache deployer job * Delete cache deployer job after it completes * minor fix * fix indention * Change cache deployer job to statefulset * Remove extra cluster role for cache deployer * remove cache in base kustomize file for upgrade test * minor fix * Enable cache and cache-deployer in base kustomization file * fix * fix * test * test * test * Refactor cluster scope resources * refactor * Add namespace for sa * Fix * Add crds folder to cluster kustomization yaml * namespace change * fix * fix * fix * update test * Rename cluster to cluster-scoped-resource * test adding namespace in kustomization file * revert namespace for clusterrolebinding * fix * Add db_name in cache_deployment manifest * rename * change secret cluster role to role	2020-04-02 14:37:04 -07:00
Yuan (Bob) Gong	5391e88fbc	[Testing] KFP standalone test infra for upgradability (#1971 ) * Implement upgrade test * mark upgrade-tests.sh as executable * Fix comments * Base upgrade_test_setup.yaml * e2e integration of upgrade test * Fix entrypoint argument * Fix e2e workflow yaml * Fix run_test.sh argument processing * Fix no closing backtick * Restrucutre upgrade_test.go to focus the test on upgrade verification * clean up code * Clean up after upgrade test when it is run in integration tests. * Include pipeline tests in upgrade test * Reorder tests * Add upgrade test coverage for run api resources * Add job api resource coverage in upgrade test & refactored upgrade test * Fix add missing step in upgrade test * Fix BUILD.bazel * Fix upgrade_test.go * Try to fix upgrade test failure * Fix hard coded namespace * Sync upgrade-tests.sh with new changes in presubmit-tests-with-pipeline-deployment.sh * Update upgrade test * Remove redundant code * Fix integration test exit code * Fix trigger interval second mismatch	2020-03-09 16:53:37 -07:00
Rui Fang	ccdb885519	[Backend]Initial execution cache (#3036 ) * Initial execution cache This commit adds initial execution cache service. Including http service and execution key generation. * Add initial server logic * Add const * Change folder name * Change execution key name * Fix unit test * Add Dockerfile and OWNERS file This commit adds Dockerfile for building source code and OWNERS file for easy review. This commit also renames some functions. * fix go.sum This PR fixes changes on go.sum * Add local deployment scripts This commit adds local deployment scripts which can deploy cache service to an existing cluster with KFP installed. * refactor src code * Add standalone deployment scripts and yamls This commit adds execution cache deployment scripts and yaml files in KFP standalone deployment. Including a deployer which will generate the certification and mutatingwebhookconfiguration and execution cache deployment. * Minor fix * Add execution cache image build in test folder * fix test cloudbuild * Fix cloudbuild * Add execution cache deployer image to test folder * Add copyright * Fix deployer build * Add license for execution cache and cloudbuild for execution cache images This commit adds licenses for execution cache source code. Also adds cloud build step for building cache image and cache deployer image. Change the manifest name based on changed image. * Refactor license intermediate data * Fix execution cache image manifest * Typo fix for cache and cache deployer images * Add arguments in ca generation scripts and change deployer base image to google/cloud * minor fix * fix arg * Mirror source code with MPL in execution_cache image * Minor fix * minor refactor on error handling * Refactor cache source code, Docker image and manifest * Fix variable names * Add images in .release.cloudbuild.yaml * Change execution_cache to generic name * revice readme * Move deployer job out of upgrade script * fix tests * fix tests * Seperate cache service and cache deployer job * mysql set up * Delete cache service in manifest, only test in presubmit tests * fix * fix presubmit tests * fix * fix * revert unnecessary change * fix cache image tag * change image gcr to ml-pipeline-test * Remove namespace in standalone manifest and add to test manifest	2020-03-03 16:13:47 -08:00
Yuan (Bob) Gong	30b79255bd	[Testing] Reduce flakiness caused by iam bindings (#3008 ) * Add retry to iam policy bindings * Add retry for iam policy changes to reduce flakiness	2020-02-07 00:05:43 -08:00
IronPan	a0a39a5eda	Install application CRD and add pipeline application CR to pipeline standalone (#2585 ) * install application CRD and add pipeline application CR * add labels and let application manager to set ownerref * fix * address comments * update test * update test * update test * update readme * fix test * update * update * update * Update application-crd.yaml * fix * fix * Update .release.cloudbuild.yaml * update tests * Update kustomization.yaml * Update deploy-pipeline-lite.sh * Update ml-pipeline-viewer-crd-sa.yaml * update tests * update tests * update tests	2020-01-16 09:20:25 -08:00
Alexey Volkov	dc34a3568d	Service - Metadata writer (#2674 ) * Metadata writer * Added sleeper-based metadata writer * Sleeper * First working draft * Added properties to Executions Artifacts and Contexts Also added attributions. context_id is now stored as label. * Prefix the execution type names * Ignoring TFX pods * Fixed the deployment container spec * Cleaned up the file and added deployment spec * Added the Kubernetes deployment * Added startup logging * Made python output unbuffered * Fixed None exception * Formatting exceptions * Prefixing the log message * Improved handling non-S3 artifacts * Logging input artifacts * Extracted code to the link_execution_to_input_artifact function * Setting execution's pipeline_name to workflow name * Adding annotation with input artifact IDs * Running infinitely * Added component version to execution type name * Marking metadata as written even for failed pods * Cleaned up some comments * Do not fail when upstream artifact is missing * Change the completion detection logic Waiting for Argo's "completed=true" instead of Kubernetes' "phase: Completed" introduced delays that lead to problems with missing input artifacts. This changes allows us to log the outpuyt artifacts earlier. * Added Dockerfile * Added release deployment manifest * Added OWNERS * Switching to using MLMD service instead of direct DB access * Adding licenses to the image * Pinned Python's minor version * Moved code to /backend/metadata_writer Moved manifest to /manifests * Added image building to CloudBuild * Added Metadata Writer to release CloudBuild * Added Metadata Writer to test scripts * Finished the kustomization manifests * Added Metadata Writer to marketplace manifests * Added ServiceAccount, Role and RoleBinding for MW * Fixed merge conflict * Removed the debug deployment * Forgot to add the chart templates for the SA and roles * Specified the service account * Switched to watching a single namespace * Resolved feedback Removed dev deployment comment from python code. Added license. Fixed the range of kubernetes package versions. * More review fixes * Extracted the metadata helper functions * Improved the error message when context type is unexpected * Fixed the import * Checking the connection to MLMD The latest tests started to have connection problems - "failed to connect to all addresses" and "Failed to pick subchannel". * Improved the MLMD connection error logging * Try creating MLMD client on each retry and using a different request * Changed the MLMD connection check request All get requests fail when the DB is empty, so we have to use a put request. See https://github.com/google/ml-metadata/issues/28 * Using unbuffered IO to improve the logging latency * Changed the URI schema for the artifacts * Cleanup * Simplified the kubernetes config loading code * Resolving the feedback	2020-01-14 23:17:32 -08:00
Yuan (Bob) Gong	4a8d262abb	Migrate standalone deployment to workload identity on GCP (#2619 ) * Script to set up workload identity for standalone deployment * Migrate tests to run on standalone + workload identity * Fix test script * Switch to static GSAs for testing, because they have name length limit * Add workload identity binding for argo * Fix argo workload identity bindings * Remove user-gcp-sa from tests * Remove use_gcp_secret from xgboost sample * Allow debugging tests locally * Wait for policies to take effect * Update deploy-pipeline-lite.sh * Update deploy-pipeline-lite.sh * [WIP] test gcloud auth list with test-runner sa * Add namespace * test again * Use new image builder * test again * Remove debug code * Remove usages of use_gcp_secret * Fix unit test and tensorboard pod template * Add debug code again to test * Try waiting until workload identity bindings are ready * Fix some other samples * Fix parameterized tfx oss sample * Add retry to image building * Try fixing tfx oss sample * Fix compiled tfx oss sample * Update all google/cloud-sdk to latest * Try fixing parameterized tfx oss sample again * Also verify pipeline-runner ksa is working * Fix parameterized_tfx_oss sample * Update gcp-workload-identity-setup.sh * Revert unneeded change * Pin to new google/cloud-sdk * Remove wrongly commited binaries	2019-12-16 22:05:58 -08:00
IronPan	0711566754	Build inverse proxy image as part of the presubmit test (#2187 ) * small fixes * add * Delete Makefile	2019-09-21 16:53:23 -07:00
Alexey Volkov	570141d7c9	Make wget quieter (#2069 )	2019-09-09 14:32:54 -07:00
Kirin Patel	41b394b045	Add e2e visualization tests (#1981 ) * Created visualization_api_test.go * Updated BUILD.bazel files * Removed clean_up from e2e test * Revert "Removed clean_up from e2e test" This reverts commit `82fd4f5a00`. * Update e2e tests to build visualizationserver and viewer-crd * Fix bug where wrong image is set * Fixed incorrect image names * Fixed additional instance of incorrect image names	2019-08-30 13:54:10 -07:00
Yuan (Bob) Gong	8e53eb43ad	Move postsubmit tests to lite deployment (#1939 ) * Move postsubmit tests to lite deployment * Reduce verbose logs by wget * Add ignored files * add test temporary file to gitignore	2019-08-23 14:34:26 -07:00
Yuan (Bob) Gong	d11fae78d8	Use KFP lite deployment for presubmit tests (#1808 ) * Refactor presubmit-tests-with-pipeline-deployment.sh so that it can be run from a different project * Simplify getting service account from cluster. * Migrate presubmit-tests-with-pipeline-deployment.sh to use kfp lightweight deployment. * Add option to cache built images to make debugging faster. * Fix cluster set up * Copy image builder image instead of granting permission * Add missed yes command * fix stuff * Let other usages of image-builder image become configurable * let test workflow use image builder image * Fix permission issue * Hide irrelevant error logs * Use shared service account key instead * Move test manifest to test folder * Move build-images.sh to a different script file * Update README.md * add cluster info dump * Use the same cluster resources as kubeflow deployment * Remove cluster info dump * Add timing to test log * cleaned up code * fix tests * address cr comments * Address cr comments * Enable image caching to improve retest speed	2019-08-20 17:25:20 -07:00

23 Commits