* Change enqueue base delay in non-error mode to 1 second
* added Terminating constants
* used terminating constants in store layer
* modified comments
* intial work'
* small fixes
* updated tests and how parameter are set
* try to fix test
* check with out adding missing test
* fixed small typo
* test changes
* updated config
* typo
* updated after feedback
* fixed pointer error
* test to add paramter
* moved to init so removed not needed code
* updated further
* updated tests to also check endtime
* clean up test
* fixed failing test
* fixed the expected test results
* added timezone examples
* further clean up
* fixed time format
* Update params.env
* moved location to cronjobscheduler
* clean up
* set env variable to empty
* reverted back
* updated to make magic nbr to constant
* updated the tests with comment
* added comments on cron expressions
* update naming and return types
* updated to UTC as default
* updated with an alpha notice
* modified validation logic of run and job
* fixed resource manager logic when creating job and run
* removed unused methods, changed to nested if else
* fixed nits
* fixed nits
* fixed nits
* fix(backend): job api -- deletion should succeed when swf not found
* bug reproducing unit test
* fix the bug and pass reproducing unit test
* reproducing integration test
* fix integration test
* clarify error message
* disable job should also succeed, unify term to CR instead of CRD
* fix unit test error
* fix error message
* improve logging
* Set current namespace in local KFP context if running from notebook
* Create "~/.config/kfp/" instead of ".config/kfp/"
At first it was assumed the `get_user_namespace` command would be executed from the home directory.
* Create local context file if it doesn't exist during set_user_namespace
* Grab path from LOCAL_KFP_CONTEXT when creating folder
Instead of harcoding the os.mkdirs path to `~/.config/kfp` it now grabs it from the LOCAL_KFP_CONTEXT. Also, removed path creation in `get_user_namespace` as that is now handled in `set_user_namespace`. Also, it now checks if the path exists rather than the local_context_file to remove the situation where it tries to create ~/.config/kfp/ because the context.json doesn't exist when the path does.
* add multi-user setting to healthz api
* Add http prefix to health api url
* move healtz api call to own function and fix multi_user boolean
* Fix HEALTH_PATH declaration
* Move check to Client __init__ and change get_kfp_healthz to avoid breaking in case of old apiserver image
* Add multi_user to frontend healthz
* Expose multi_user in frontend and add integration test
* Fix integration test
* Fix host hardcoding and error handling
* Handle empty API response, check if API up to date
* Fix response return
* remove API check due to empty response
* retry API call if first response empty
* retry getting healthz api if no response
* change health_api to https
The healthz_api has been returning empty responses which might be caused by sending an http request to an https endpoint. Although requests handles redirects, this commit is to test if this solves the issue.
* Add some debug info to healthz exception
* add url to debug and lower retries to 1
* Use api_client to get healthz data
* Debug info for API response
* Follow API redirect history
* Fix indentation
* Add healthz proto
* Try getting healthz api with new python backend
* Add installation of kfp_server_api in tests
* Fix incorrect setup location
* Replace old .get with new http backend .multi_user
* Code clean up
* Small fixes and TimeOutError for retries healthz api
* Remove changes to go dependencies
* Send empty proto request and fix exception client
* Remove unused commit_sha and tag_name
* [Backend] Return proper error codes for failures during auth
* [Backend] Implement helpers to initialize a SubjectAccessReview client
In preparation of SubjectAccessReview, we implement some helpers to
create a new Kubernetes Authorization clientset and return the
SubjectAccessReview client.
We also define some fake clients to be used by future tests.
* [Backend] Introduce RBAC-related constants
In preparation of SubjectAccessReview, introduce RBAC groups, resources,
and verbs.
* [Backend] Extend managers with a SubjectAccessReviewClient
* [Backend] Refactor the authorization mechanism for requests
Authorization should be based on performing some action on a resource
living in a namespace. This commit refactors the authorization utilities
to reflect this and perform SubjectAccessReview.
This commit also deletes some tests based on old authn/authz mechanism.
A following commit will fix/extend the tests for the new mechanism
* [Backend] Adjust endpoints to pass resource attributes for authz
With KFAM authorization, we passed only the namespace attribute for
authorization. With SubjectAccessReview, we need a richer list of
attributes. Thus, we adjust endpoints to pass request details (resource
attributes) necessary for authorizing the request. We only change the
already authorized endpoints, not introducing any new checks.
* [Backend] Adjust apiserver/server tests to SubjectAccessReview
* [Backend] Purge KFAM
Since we no longer use KFAM, we may as well purge it
* [Backend] Update BUILD files
Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>
* [Manifests] Extend manifests for SubjectAccessReview
* API Server: Allow creating SubjectAccessReviews
* Add view/edit roles in a multi-user kustomization
* New server API: read run log
- The new server API endpoint (/apis/v1beta1/runs/{run_id}/nodes/{node_id}/log) to fetch run log
- `ARCHIVE_LOG_FILE_NAME` and `ARCHIVE_LOG_PATH_PREFIX` options allows to control archive log path
- UI Server fetches logs from server API or directly from k8s depending on `STREAM_LOGS_FROM_SERVER_API` option
* New server API: read run log
- ml-pipeline rbac update: allow for access to log
* Read run log: enhanced error handling
- log message on Pod access errors
* Read run log: enhanced log archive options
* Code format
* Test update after getPodLogs signature change
* Updated comments after review
* `follow` query parameter in GET /apis/v1beta1/runs/{run_id}/nodes/{node_id}/log
* Env variable friendly config names & comments
- Config options: ARCHIVE_CONFIG_LOG_FILE_NAME, ARCHIVE_CONFIG_LOG_PATH_PREFIX
- Copyright message update
- New endpoint as `v1alpha1`
* Licence updates
- fluent-bit licence inlined
- copyright message updates
* Master merge
- dependency conflicts
* simplified test
* Updated and refacoted the tests further
* fixed error in search replace'
* test cleanup'
* fixed line length
* updated the naming
* updated tests
* removed paramter that is not needed
Previously, Metadata Writer could only store input artifacts, but could not store input parameter arguments (since they were not available).
The SDK can now preserve parameter arguments in Argo template annotation.
The commit makes Metadata Writer extract information from that annotation and record it to MLMD.
Fixes https://github.com/kubeflow/pipelines/issues/4556
* updated the version
* updated the serializer
* fixed test
* fixed some more changes
* tested to update versions of k8 packages
* reverted package update
* change in API
* fixed dependencies, need to fix broken tests now
* updated fake client and fixed test due to updates in timestam.timestamp
* missed to update fake client pod
* fixed issue in controller
* tested to update
* updated
* updated controller viewer
* updates to fix go mod vendor
* Updated the client
* updated the golang versions
* missed one docker file update, from 1.11 -> 1.13
* testing to fixe persistinace agent issues
* Updated after feedback
Co-authored-by: Niklas hansson <niklashansson@Niklass-MacBook-Pro.local>
* update to fetch remote
* missed to add the description
* fixed merge conflict
* initial work
* fixed test and bug
* updated python client
* clean up
* clean up
* added config default
* fixed bug in API
* moved config value
* reverted to load from config
* clean up
* Update _client.py
* removed unecessary function and updated after feedback
* missed to save pipeline.proto
* updated the last parts after feedback
* reverted back to use string and env variable
* updated typo
* fix typo in path
* clean up
* removed option in api
* clean up python part
* typo, cant run test locally
* clean up, problems with local env
* clean up missing differences
* reverted proto files
* further clean up
* clean up
* updated after feedback
* Added tests
* error in my defer statement
* Updated the test
* initial work on exposing the default version of pipeline
* update description
* added missing files
* updated api build, unsure if this is correct ...
* updated after feedback
* clean up
* remove empty line
* started to make the integration test
* added integreation test
* fixed build and feedback
* updated the tests
* Updated the test
* new test
* typo
* updated the pipeline default
* updated the pipeline version
* formatting
* error in comparison
* Cache deployer - Using the same kubectl version as the server
Fixes https://github.com/kubeflow/pipelines/issues/4505
* Changed the PATH precedence
* Unquoted the jq output
* Fixed the curl options
* fix(backend): workflow not found error should be permanent
* failing test case
* Fix logic
* fix another case
* Switched to not found error
* not found error should be permanent
* improve license.sh logging
* build: remove our own scripts to comply with pypi package licenses
* Remove unneeded packages when we do not need to handle licensing ourselves
Fixes 3584.
For clusters with existing native Argo Workflows, ml-pipeline logs were dirtied
with unneccessary stack traces due to "missing Run ID label" situation.
Made persistenceagent skip the workflow if it misses the Run ID label, and
added workflow name to previous error message in apiserver side.
* Backend - Caching - Only send cache-enabled pods to the caching webhook
The caching webhook already checks whether the pod is cache-enabled, but this change makes the check happen sooner - even before calling the webhook.
This way the webhook cannot possibly affect any non-KFP pods.
This feature requires API v1 and Kubernetes v1.15, so we use it conditionally.
* Support filtering on Kubernetes v1.15 as well
Fixes#4389 (partially).
When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:
```
I0831 06:36:53.916141 1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]
goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
external/org_golang_google_grpc/server.go:708 +0xa1
```
It was same in CreateJob calls.
Scenario described in #4389 also seems causing the same issue.
With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.
Other problems about right pipeline version picking described in issue #4389
still need to be addressed.
Reduced the timeout from 30 seconds to 5.
This should not be needed as most users tell us that pods work even when the cache service is not available. But there was at least one customer who experienced timeout failures when creating pods after the service was deleted, but not the webhook config.
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* switch from bazel build to go build
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* prometheus configs; basic metrics in pipeline server to collect
prometheus metrics
* make version consistent
s Please enter the commit message for your changes. Lines starting
* check if we gc workflows
* add prom deps
* upload counts
* remove non-code changes
* more metrics
* upload server metrics guarded by flag
* todo for a flag
* fix test
* fix tests
* fix tests
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* add a quick guide on how to generate api reference from kfp api definition
* remove trailing lines
* Backend - Cache - Fixed reinstallation by adding missing roles
* Stop ignoring the deletion errors
* Added patch permission as well
It should not be triggered, but might be useful in the future.
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* sorting by run metrics is different from sorting by name, uuid, created at, etc. The lattre are direct field in listable object, the former is an element in an arrary-typed field in listable object. In other words, the latter are columns in table, the former is not.
* unit test: add sorting on metrics with both asc and desc order
* GetFieldValue in all models
* fix unit test
* whether to test in list_test. It's hacky when check mode == 'run'
* move model specific code to model; prevent model package depends on list package; let list package depends on modelpackage; marshal/unmarshal listable interface; include listable interface in token.
* some assumption on token's Model field
* fix the regular field checking logic
* add comment to help devs to use the new field
* add a validation check
* Listable object can be too large to be in token. So replace it with only
relevant fields taken out of it. In the future, if more fields in
Listable object become relevant, manually add it to token
* matches func update
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* sorting by run metrics is different from sorting by name, uuid, created at, etc. The lattre are direct field in listable object, the former is an element in an arrary-typed field in listable object. In other words, the latter are columns in table, the former is not.
* unit test: add sorting on metrics with both asc and desc order
* list is generic. model specific test is put to run_store_test.go
* Metadata Writer - Fixed regression with artifact type retrieval
The DSL compiler has changed the output name sanitization rules, so we should change them here accordingly.
* Added the code link
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* metrics as the outermost
* columns swap
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* fix google api core to 1.16.0 until it gets newer release than 1.21.0
* add comments
* Backend - Only compiling the preloaded samples
Fixes https://github.com/kubeflow/pipelines/issues/4117
* Fixed the paths
* Removed -o pipefail for now since sh does not support it
* Fixed the quotes
* Removed the __future__ imports
Python 2 is no longer supported.
The annotations cause compilation problems:
```
File "/samples/core/iris/iris.py", line 18
from __future__ import absolute_import
^
SyntaxError: from __future__ imports must occur at the beginning of the file
```
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* tfx 0.21.2 -> 0.22.2
* tfx 0.20.2 -> 0.22.0
* update requirements.txt
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* add filter for listing versions
* add another filter in test
* comment revision
* reduce ttl of pesisted final workflow to 1 day
* add comment
* enable pagination when expanding experiment in both the home page and the archive page
* Revert "enable pagination when expanding experiment in both the home page and the archive page"
This reverts commit 5b672739dd.
* Address comments