* Initial execution cache
This commit adds initial execution cache service. Including http service
and execution key generation.
* Add initial server logic
* Add const
* Change folder name
* Change execution key name
* Fix unit test
* Add Dockerfile and OWNERS file
This commit adds Dockerfile for building source code and OWNERS file for
easy review. This commit also renames some functions.
* fix go.sum
This PR fixes changes on go.sum
* Add local deployment scripts
This commit adds local deployment scripts which can deploy cache service
to an existing cluster with KFP installed.
* refactor src code
* Add standalone deployment scripts and yamls
This commit adds execution cache deployment scripts and yaml files in
KFP standalone deployment. Including a deployer which will generate the
certification and mutatingwebhookconfiguration and execution cache
deployment.
* Minor fix
* Add execution cache image build in test folder
* fix test cloudbuild
* Fix cloudbuild
* Add execution cache deployer image to test folder
* Add copyright
* Fix deployer build
* Add license for execution cache and cloudbuild for execution cache
images
This commit adds licenses for execution cache source code. Also adds
cloud build step for building cache image and cache deployer image.
Change the manifest name based on changed image.
* Refactor license intermediate data
* Fix execution cache image manifest
* Typo fix for cache and cache deployer images
* Add arguments in ca generation scripts and change deployer base image to google/cloud
* minor fix
* fix arg
* Mirror source code with MPL in execution_cache image
* Minor fix
* minor refactor on error handling
* Refactor cache source code, Docker image and manifest
* Fix variable names
* Add images in .release.cloudbuild.yaml
* Change execution_cache to generic name
* revice readme
* Move deployer job out of upgrade script
* fix tests
* fix tests
* Seperate cache service and cache deployer job
* mysql set up
* Delete cache service in manifest, only test in presubmit tests
* fix
* fix presubmit tests
* fix
* fix
* revert unnecessary change
* fix cache image tag
* change image gcr to ml-pipeline-test
* Remove namespace in standalone manifest and add to test manifest
* Metadata writer
* Added sleeper-based metadata writer
* Sleeper
* First working draft
* Added properties to Executions Artifacts and Contexts
Also added attributions.
context_id is now stored as label.
* Prefix the execution type names
* Ignoring TFX pods
* Fixed the deployment container spec
* Cleaned up the file and added deployment spec
* Added the Kubernetes deployment
* Added startup logging
* Made python output unbuffered
* Fixed None exception
* Formatting exceptions
* Prefixing the log message
* Improved handling non-S3 artifacts
* Logging input artifacts
* Extracted code to the link_execution_to_input_artifact function
* Setting execution's pipeline_name to workflow name
* Adding annotation with input artifact IDs
* Running infinitely
* Added component version to execution type name
* Marking metadata as written even for failed pods
* Cleaned up some comments
* Do not fail when upstream artifact is missing
* Change the completion detection logic
Waiting for Argo's "completed=true" instead of Kubernetes' "phase: Completed" introduced delays that lead to problems with missing input artifacts.
This changes allows us to log the outpuyt artifacts earlier.
* Added Dockerfile
* Added release deployment manifest
* Added OWNERS
* Switching to using MLMD service instead of direct DB access
* Adding licenses to the image
* Pinned Python's minor version
* Moved code to /backend/metadata_writer
Moved manifest to /manifests
* Added image building to CloudBuild
* Added Metadata Writer to release CloudBuild
* Added Metadata Writer to test scripts
* Finished the kustomization manifests
* Added Metadata Writer to marketplace manifests
* Added ServiceAccount, Role and RoleBinding for MW
* Fixed merge conflict
* Removed the debug deployment
* Forgot to add the chart templates for the SA and roles
* Specified the service account
* Switched to watching a single namespace
* Resolved feedback
Removed dev deployment comment from python code.
Added license.
Fixed the range of kubernetes package versions.
* More review fixes
* Extracted the metadata helper functions
* Improved the error message when context type is unexpected
* Fixed the import
* Checking the connection to MLMD
The latest tests started to have connection problems - "failed to connect to all addresses" and "Failed to pick subchannel".
* Improved the MLMD connection error logging
* Try creating MLMD client on each retry and using a different request
* Changed the MLMD connection check request
All get requests fail when the DB is empty, so we have to use a put request.
See https://github.com/google/ml-metadata/issues/28
* Using unbuffered IO to improve the logging latency
* Changed the URI schema for the artifacts
* Cleanup
* Simplified the kubernetes config loading code
* Resolving the feedback
* Created visualization_api_test.go
* Updated BUILD.bazel files
* Removed clean_up from e2e test
* Revert "Removed clean_up from e2e test"
This reverts commit 82fd4f5a00.
* Update e2e tests to build visualizationserver and viewer-crd
* Fix bug where wrong image is set
* Fixed incorrect image names
* Fixed additional instance of incorrect image names
* Refactor presubmit-tests-with-pipeline-deployment.sh so that it can be run from a different project
* Simplify getting service account from cluster.
* Migrate presubmit-tests-with-pipeline-deployment.sh to use kfp
lightweight deployment.
* Add option to cache built images to make debugging faster.
* Fix cluster set up
* Copy image builder image instead of granting permission
* Add missed yes command
* fix stuff
* Let other usages of image-builder image become configurable
* let test workflow use image builder image
* Fix permission issue
* Hide irrelevant error logs
* Use shared service account key instead
* Move test manifest to test folder
* Move build-images.sh to a different script file
* Update README.md
* add cluster info dump
* Use the same cluster resources as kubeflow deployment
* Remove cluster info dump
* Add timing to test log
* cleaned up code
* fix tests
* address cr comments
* Address cr comments
* Enable image caching to improve retest speed