* exclude `sdk-` tags when preparing base deployment for upgrade test
* Update deploy-pipeline-lite.sh
* Update deploy-pipeline-lite.sh
* Update upgrade-tests.sh
* Initial execution cache
This commit adds initial execution cache service. Including http service
and execution key generation.
* fix master
* Add cache manifests for mkp deployment
* revert go.sum
* Add helm on delete policy for cache deployer job
* Change cache deployer job to statefulset
* remove unnecessary cluster role
* seperate clusterrole and role
* add role and rolebinding to mkp
* change secret role to clusterrole
* Add cloudsql support to cache
* fix comma
* Change cache secret clusterrole to role
* Adjust sequences of resources
* Update values and schema
* remove extra tab
* Change statefulset to job
* Add pod delete permission to cache deployer role
* Test changing cache deployer job to deployment
* remove extra permission
* remove statefulset check
* Initial execution cache
This commit adds initial execution cache service. Including http service
and execution key generation.
* fix master
* Change cache deployer job to stateful set
* Delete cache deployer job
* Delete cache deployer job after it completes
* minor fix
* fix indention
* Change cache deployer job to statefulset
* Remove extra cluster role for cache deployer
* remove cache in base kustomize file for upgrade test
* minor fix
* Enable cache and cache-deployer in base kustomization file
* fix
* fix
* test
* test
* test
* Refactor cluster scope resources
* refactor
* Add namespace for sa
* Fix
* Add crds folder to cluster kustomization yaml
* namespace change
* fix
* fix
* fix
* update test
* Rename cluster to cluster-scoped-resource
* test adding namespace in kustomization file
* revert namespace for clusterrolebinding
* fix
* Add db_name in cache_deployment manifest
* rename
* change secret cluster role to role
* Implement upgrade test
* mark upgrade-tests.sh as executable
* Fix comments
* Base upgrade_test_setup.yaml
* e2e integration of upgrade test
* Fix entrypoint argument
* Fix e2e workflow yaml
* Fix run_test.sh argument processing
* Fix no closing backtick
* Restrucutre upgrade_test.go to focus the test on upgrade verification
* clean up code
* Clean up after upgrade test when it is run in integration tests.
* Include pipeline tests in upgrade test
* Reorder tests
* Add upgrade test coverage for run api resources
* Add job api resource coverage in upgrade test & refactored upgrade test
* Fix add missing step in upgrade test
* Fix BUILD.bazel
* Fix upgrade_test.go
* Try to fix upgrade test failure
* Fix hard coded namespace
* Sync upgrade-tests.sh with new changes in presubmit-tests-with-pipeline-deployment.sh
* Update upgrade test
* Remove redundant code
* Fix integration test exit code
* Fix trigger interval second mismatch
* Initial execution cache
This commit adds initial execution cache service. Including http service
and execution key generation.
* Add initial server logic
* Add const
* Change folder name
* Change execution key name
* Fix unit test
* Add Dockerfile and OWNERS file
This commit adds Dockerfile for building source code and OWNERS file for
easy review. This commit also renames some functions.
* fix go.sum
This PR fixes changes on go.sum
* Add local deployment scripts
This commit adds local deployment scripts which can deploy cache service
to an existing cluster with KFP installed.
* refactor src code
* Add standalone deployment scripts and yamls
This commit adds execution cache deployment scripts and yaml files in
KFP standalone deployment. Including a deployer which will generate the
certification and mutatingwebhookconfiguration and execution cache
deployment.
* Minor fix
* Add execution cache image build in test folder
* fix test cloudbuild
* Fix cloudbuild
* Add execution cache deployer image to test folder
* Add copyright
* Fix deployer build
* Add license for execution cache and cloudbuild for execution cache
images
This commit adds licenses for execution cache source code. Also adds
cloud build step for building cache image and cache deployer image.
Change the manifest name based on changed image.
* Refactor license intermediate data
* Fix execution cache image manifest
* Typo fix for cache and cache deployer images
* Add arguments in ca generation scripts and change deployer base image to google/cloud
* minor fix
* fix arg
* Mirror source code with MPL in execution_cache image
* Minor fix
* minor refactor on error handling
* Refactor cache source code, Docker image and manifest
* Fix variable names
* Add images in .release.cloudbuild.yaml
* Change execution_cache to generic name
* revice readme
* Move deployer job out of upgrade script
* fix tests
* fix tests
* Seperate cache service and cache deployer job
* mysql set up
* Delete cache service in manifest, only test in presubmit tests
* fix
* fix presubmit tests
* fix
* fix
* revert unnecessary change
* fix cache image tag
* change image gcr to ml-pipeline-test
* Remove namespace in standalone manifest and add to test manifest
* Metadata writer
* Added sleeper-based metadata writer
* Sleeper
* First working draft
* Added properties to Executions Artifacts and Contexts
Also added attributions.
context_id is now stored as label.
* Prefix the execution type names
* Ignoring TFX pods
* Fixed the deployment container spec
* Cleaned up the file and added deployment spec
* Added the Kubernetes deployment
* Added startup logging
* Made python output unbuffered
* Fixed None exception
* Formatting exceptions
* Prefixing the log message
* Improved handling non-S3 artifacts
* Logging input artifacts
* Extracted code to the link_execution_to_input_artifact function
* Setting execution's pipeline_name to workflow name
* Adding annotation with input artifact IDs
* Running infinitely
* Added component version to execution type name
* Marking metadata as written even for failed pods
* Cleaned up some comments
* Do not fail when upstream artifact is missing
* Change the completion detection logic
Waiting for Argo's "completed=true" instead of Kubernetes' "phase: Completed" introduced delays that lead to problems with missing input artifacts.
This changes allows us to log the outpuyt artifacts earlier.
* Added Dockerfile
* Added release deployment manifest
* Added OWNERS
* Switching to using MLMD service instead of direct DB access
* Adding licenses to the image
* Pinned Python's minor version
* Moved code to /backend/metadata_writer
Moved manifest to /manifests
* Added image building to CloudBuild
* Added Metadata Writer to release CloudBuild
* Added Metadata Writer to test scripts
* Finished the kustomization manifests
* Added Metadata Writer to marketplace manifests
* Added ServiceAccount, Role and RoleBinding for MW
* Fixed merge conflict
* Removed the debug deployment
* Forgot to add the chart templates for the SA and roles
* Specified the service account
* Switched to watching a single namespace
* Resolved feedback
Removed dev deployment comment from python code.
Added license.
Fixed the range of kubernetes package versions.
* More review fixes
* Extracted the metadata helper functions
* Improved the error message when context type is unexpected
* Fixed the import
* Checking the connection to MLMD
The latest tests started to have connection problems - "failed to connect to all addresses" and "Failed to pick subchannel".
* Improved the MLMD connection error logging
* Try creating MLMD client on each retry and using a different request
* Changed the MLMD connection check request
All get requests fail when the DB is empty, so we have to use a put request.
See https://github.com/google/ml-metadata/issues/28
* Using unbuffered IO to improve the logging latency
* Changed the URI schema for the artifacts
* Cleanup
* Simplified the kubernetes config loading code
* Resolving the feedback
* Script to set up workload identity for standalone deployment
* Migrate tests to run on standalone + workload identity
* Fix test script
* Switch to static GSAs for testing, because they have name length limit
* Add workload identity binding for argo
* Fix argo workload identity bindings
* Remove user-gcp-sa from tests
* Remove use_gcp_secret from xgboost sample
* Allow debugging tests locally
* Wait for policies to take effect
* Update deploy-pipeline-lite.sh
* Update deploy-pipeline-lite.sh
* [WIP] test gcloud auth list with test-runner sa
* Add namespace
* test again
* Use new image builder
* test again
* Remove debug code
* Remove usages of use_gcp_secret
* Fix unit test and tensorboard pod template
* Add debug code again to test
* Try waiting until workload identity bindings are ready
* Fix some other samples
* Fix parameterized tfx oss sample
* Add retry to image building
* Try fixing tfx oss sample
* Fix compiled tfx oss sample
* Update all google/cloud-sdk to latest
* Try fixing parameterized tfx oss sample again
* Also verify pipeline-runner ksa is working
* Fix parameterized_tfx_oss sample
* Update gcp-workload-identity-setup.sh
* Revert unneeded change
* Pin to new google/cloud-sdk
* Remove wrongly commited binaries
* Created visualization_api_test.go
* Updated BUILD.bazel files
* Removed clean_up from e2e test
* Revert "Removed clean_up from e2e test"
This reverts commit 82fd4f5a00.
* Update e2e tests to build visualizationserver and viewer-crd
* Fix bug where wrong image is set
* Fixed incorrect image names
* Fixed additional instance of incorrect image names
* Refactor presubmit-tests-with-pipeline-deployment.sh so that it can be run from a different project
* Simplify getting service account from cluster.
* Migrate presubmit-tests-with-pipeline-deployment.sh to use kfp
lightweight deployment.
* Add option to cache built images to make debugging faster.
* Fix cluster set up
* Copy image builder image instead of granting permission
* Add missed yes command
* fix stuff
* Let other usages of image-builder image become configurable
* let test workflow use image builder image
* Fix permission issue
* Hide irrelevant error logs
* Use shared service account key instead
* Move test manifest to test folder
* Move build-images.sh to a different script file
* Update README.md
* add cluster info dump
* Use the same cluster resources as kubeflow deployment
* Remove cluster info dump
* Add timing to test log
* cleaned up code
* fix tests
* address cr comments
* Address cr comments
* Enable image caching to improve retest speed