Commit Graph

492 Commits

Author SHA1 Message Date
StefanoFioravanzo b5c10d2a21
feat(swf): Add a [[RunUUID]] macro (#4995)
* [SWF] Add [[RunUUID]] macro

Signed-off-by: Stefano Fioravanzo <stefano@arrikto.com>

* [SWF] Fix typo in function name

Signed-off-by: Stefano Fioravanzo <stefano@arrikto.com>
2021-02-01 03:35:02 -08:00
Chen Sun 26de102f82 chore(release): bumped version to 1.4.0-rc.1 2021-02-01 00:18:50 -08:00
James Liu 35bc50e8e7
Upgrade tfx version to 0.26.0 in backend (#5052) 2021-01-28 17:49:01 -08:00
Niklas Hansson 2f04bc6697
fix(backend): fix periodic schedule to begin at start time. Fixes #3935 (#5027)
* Fixed periodic schedule to start at starttime

* clean up

* clean up
2021-01-26 03:37:08 -08:00
James Liu a8b7fc97b1
fix(test): Fix presubmit with python version upgrade (#5033)
* Fix presubmit with python version upgrade

* Update Dockerfile

Co-authored-by: Yuan (Bob) Gong <4957653+Bobgy@users.noreply.github.com>
2021-01-26 01:47:00 -08:00
Ben Wallace 9a30e973d9
fix(backend): allow multiple values per key/op in filter. Fixes #4975 (#4990)
* modify filter fields to map strings to slices of structs

* fix broken json tests

* add unit tests for new functionality
2021-01-20 15:57:00 -08:00
Niklas Hansson ca09c7a026
fix(backend): add default value for CRON_SCHEDULE_TIMEZONE (#4977)
* fixed config so the paramter is avilabel for gcp market place as well

* wrong value
2021-01-14 12:33:31 -08:00
capri-xiyue 4ab5d63f71
fix(backend):Change enqueue base delay in non-error mode to 1 second for persistent agent (#4957)
* Change enqueue base delay in non-error mode to 1 second

* added Terminating constants

* used terminating constants in store layer

* modified comments
2021-01-14 11:18:55 -08:00
Niklas Hansson eeb7f8f04a
fix(backend): make the scheduleworkflowontroller timezone aware. Fixes #2653 (#4641)
* intial work'

* small fixes

* updated tests and how parameter are set

* try to fix test

* check with out adding missing test

* fixed small typo

* test changes

* updated config

* typo

* updated after feedback

* fixed pointer error

* test to add paramter

* moved to init so removed not needed code

* updated further

* updated tests to also check endtime

* clean up test

* fixed failing test

* fixed the expected test results

* added timezone examples

* further clean up

* fixed time format

* Update params.env

* moved location to cronjobscheduler

* clean up

* set env variable to empty

* reverted back

* updated to make magic nbr to constant

* updated the tests with comment

* added comments on cron expressions

* update naming and return types

* updated to UTC as default

* updated with an alpha notice
2021-01-10 00:59:05 -08:00
Yang Pan c484cfa46c chore(release): bumped version to 1.3.0 2021-01-07 00:39:26 -08:00
Yuan (Bob) Gong c72bac36b1
chore: add capri-xiyue as backend reviewer (#4964) 2021-01-06 19:11:45 -08:00
capri-xiyue 768317aee3
fix(backend): fixed validation logic and resource manager logic when creating job and run (#4914)
* modified validation logic of run and job

* fixed resource manager logic when creating job and run

* removed unused methods, changed to nested if else

* fixed nits

* fixed nits

* fixed nits
2020-12-22 23:24:26 -08:00
capri-xiyue 1791d8e185
docs(backend)update docs of deploying apiserver (#4930) 2020-12-22 22:34:26 -08:00
Chen Sun 5445ce82c7 chore(release): bumped version to 1.2.0 2020-12-17 23:24:32 -08:00
Yuan (Bob) Gong 135bfbb9f2
test(cache): fix cache deployer image build in apk add command (#4902) 2020-12-15 20:06:19 -08:00
Yuan (Bob) Gong 9acc440c31
docs(backend): clean up readme (#4896) 2020-12-14 21:37:48 -08:00
numerology 1449cfe0a5 chore(release): bumped version to 1.1.2 2020-12-14 09:43:07 -08:00
Yuan (Bob) Gong 6895f1977b
chore(backend): delete outdated backend/api/Makefile. Fixes #4717 (#4893) 2020-12-13 18:05:28 -08:00
Yuan (Bob) Gong 4df8925b05
fix(backend): job api -- deletion/disabling should succeed when swf not found. Fixes #4871 (#4884)
* fix(backend): job api -- deletion should succeed when swf not found

* bug reproducing unit test

* fix the bug and pass reproducing unit test

* reproducing integration test

* fix integration test

* clarify error message

* disable job should also succeed, unify term to CR instead of CRD

* fix unit test error

* fix error message

* improve logging
2020-12-09 18:12:52 -08:00
Rui Fang a0a1a5d0cf chore(release): bumped version to 1.1.2-rc.1 2020-12-04 07:09:17 +00:00
Rui Fang 8a22a89c7d
chore(release): upgrade mlmd to 0.25.1 (#4859)
* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* fix go.sum

* upgrade mlmd to 0.25.1

* Update requirement.txt and it's scripts
2020-12-02 22:13:00 -08:00
hilcj c1aebb5d22 chore(release): bumped version to 1.1.1-beta.1 2020-11-26 17:58:04 +00:00
hilcj 4fe4a30545 Revert "chore(release): bumped version to 1.1.1-beta.1"
This reverts commit 9af3e79c10.
2020-11-26 16:10:10 +00:00
hilcj 9af3e79c10 chore(release): bumped version to 1.1.1-beta.1 2020-11-26 04:32:09 +00:00
hilcj bd86072a8c Revert "chore(release): bumped version to 1.1.1.beta.1"
This reverts commit 5928a2659b.
2020-11-26 04:20:10 +00:00
hilcj 5928a2659b chore(release): bumped version to 1.1.1.beta.1 2020-11-26 03:07:26 +00:00
DavidSpek 0df9473bba
feat: Set current namespace for in-cluster SDK in multi-user mode and add healthz endpoint to API backends (#4638)
* Set current namespace in local KFP context if running from notebook

* Create "~/.config/kfp/" instead of ".config/kfp/"

At first it was assumed the `get_user_namespace` command would be executed from the home directory.

* Create local context file if it doesn't exist during set_user_namespace

* Grab path from LOCAL_KFP_CONTEXT when creating folder

Instead of harcoding the os.mkdirs path to `~/.config/kfp` it now grabs it from the LOCAL_KFP_CONTEXT. Also, removed path creation in `get_user_namespace` as that is now handled in `set_user_namespace`. Also, it now checks if the path exists rather than the local_context_file to remove the situation where it tries to create ~/.config/kfp/ because the context.json doesn't exist when the path does.

* add multi-user setting to healthz api

* Add http prefix to health api url

* move healtz api call to own function and fix multi_user boolean

* Fix HEALTH_PATH declaration

* Move check to Client __init__ and change get_kfp_healthz to avoid breaking in case of old apiserver image

* Add multi_user to frontend healthz

* Expose multi_user in frontend and add integration test

* Fix integration test

* Fix host hardcoding and error handling

* Handle empty API response, check if API up to date

* Fix response return

* remove API check due to empty response

* retry API call if first response empty

* retry getting healthz api if no response

* change health_api to https

The healthz_api has been returning empty responses which might be caused by sending an http request to an https endpoint. Although requests handles redirects, this commit is to test if this solves the issue.

* Add some debug info to healthz exception

* add url to debug and lower retries to 1

* Use api_client to get healthz data

* Debug info for API response

* Follow API redirect history

* Fix indentation

* Add healthz proto

* Try getting healthz api with new python backend

* Add installation of kfp_server_api in tests

* Fix incorrect setup location

* Replace old .get with new http backend .multi_user

* Code clean up

* Small fixes and TimeOutError for retries healthz api

* Remove changes to go dependencies

* Send empty proto request and fix exception client

* Remove unused commit_sha and tag_name
2020-11-24 15:36:39 -08:00
Ilias Katsakioris 39203d5ffa
feat(backend): Refactor authz to perform SubjectAccessReview. Fixes #3513 (#4723)
* [Backend] Return proper error codes for failures during auth

* [Backend] Implement helpers to initialize a SubjectAccessReview client

In preparation of SubjectAccessReview, we implement some helpers to
create a new Kubernetes Authorization clientset and return the
SubjectAccessReview client.
We also define some fake clients to be used by future tests.

* [Backend] Introduce RBAC-related constants

In preparation of SubjectAccessReview, introduce RBAC groups, resources,
and verbs.

* [Backend] Extend managers with a SubjectAccessReviewClient

* [Backend] Refactor the authorization mechanism for requests

Authorization should be based on performing some action on a resource
living in a namespace. This commit refactors the authorization utilities
to reflect this and perform SubjectAccessReview.

This commit also deletes some tests based on old authn/authz mechanism.
A following commit will fix/extend the tests for the new mechanism

* [Backend] Adjust endpoints to pass resource attributes for authz

With KFAM authorization, we passed only the namespace attribute for
authorization. With SubjectAccessReview, we need a richer list of
attributes. Thus, we adjust endpoints to pass request details (resource
attributes) necessary for authorizing the request. We only change the
already authorized endpoints, not introducing any new checks.

* [Backend] Adjust apiserver/server tests to SubjectAccessReview

* [Backend] Purge KFAM

Since we no longer use KFAM, we may as well purge it

* [Backend] Update BUILD files

Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>

* [Manifests] Extend manifests for SubjectAccessReview

* API Server: Allow creating SubjectAccessReviews
* Add view/edit roles in a multi-user kustomization
2020-11-17 14:56:05 -08:00
Niklas Hansson 1b924e6e72
fix(Process): update backend development README. Fixes #4750 (#4774)
* updated backend README.md

* updated the name
2020-11-16 21:14:04 -08:00
Rafał Bigaj 678ae0fe08
feat(backend): new server API to read run log. Fixes #4468 (#4493)
* New server API: read run log

- The new server API endpoint (/apis/v1beta1/runs/{run_id}/nodes/{node_id}/log) to fetch run log
- `ARCHIVE_LOG_FILE_NAME` and `ARCHIVE_LOG_PATH_PREFIX` options allows to control archive log path
- UI Server fetches logs from server API or directly from k8s depending on `STREAM_LOGS_FROM_SERVER_API` option

* New server API: read run log

- ml-pipeline rbac update: allow for access to log

* Read run log: enhanced error handling

- log message on Pod access errors

* Read run log: enhanced log archive options

* Code format

* Test update after getPodLogs signature change

* Updated comments after review

* `follow` query parameter in GET /apis/v1beta1/runs/{run_id}/nodes/{node_id}/log

* Env variable friendly config names & comments

- Config options: ARCHIVE_CONFIG_LOG_FILE_NAME, ARCHIVE_CONFIG_LOG_PATH_PREFIX
- Copyright message update
- New endpoint as `v1alpha1`

* Licence updates

- fluent-bit licence inlined
- copyright message updates

* Master merge

- dependency conflicts
2020-11-11 00:37:50 -08:00
Yuan Gong e28ec4d2df chore: update OWNERS 2020-11-09 10:20:31 +08:00
Yuan Gong 66ccf335e8 chore: update OWNERS 2020-11-09 10:18:18 +08:00
Niklas Hansson 92a932e9d9
fix(backend): added setup structure to simplify adding new tests and removing duplicated code. Fixes #4630 (#4639)
* simplified test

* Updated and refacoted the tests further

* fixed error in search replace'

* test cleanup'

* fixed line length

* updated the naming

* updated tests

* removed paramter that is not needed
2020-11-04 14:54:53 -08:00
Yuan Gong 7d36f48482 chore(release): bumped version to 1.1.0-alpha.1 2020-11-02 03:01:27 +00:00
Niklas Hansson a6c79c2e2c
test(backend): added missing assert statement in job_api_test . (#4705) 2020-11-01 16:36:52 -08:00
Alexey Volkov ec65dfe70a
feat(backend): Metadata Writer - Record parameter argument values to MLMD (#4564)
Previously, Metadata Writer could only store input artifacts, but could not store input parameter arguments (since they were not available).
The SDK can now preserve parameter arguments in Argo template annotation.
The commit makes Metadata Writer extract information from that annotation and record it to MLMD.

Fixes https://github.com/kubeflow/pipelines/issues/4556
2020-10-27 00:23:59 -07:00
Alexey Volkov fc6b5d6c2e
fix(backend): Metadata Writer - Fixed setting execution custom properties (#4670) 2020-10-24 20:19:00 -07:00
Niklas Hansson d7793aff1b
fix(backend): updated the argo version too 2.7.7. Fixes #4392 (#4498)
* updated the version

* updated the serializer

* fixed test

* fixed some more  changes

* tested to update versions of k8 packages

* reverted package update

* change in API

* fixed dependencies, need to fix broken tests now

* updated fake client and fixed test due to updates in timestam.timestamp

* missed  to update fake client pod

* fixed issue in controller

* tested to update

* updated

* updated controller viewer

* updates to fix go mod vendor

* Updated the client

* updated the golang versions

* missed one docker file update, from 1.11 -> 1.13

* testing to fixe persistinace agent issues

* Updated after feedback

Co-authored-by: Niklas hansson <niklashansson@Niklass-MacBook-Pro.local>
2020-10-22 17:09:36 -07:00
Alexey Volkov dde7f9a5d6
fix(backend): Caching - Fixed deployer failure on Kubernetes v1.16+. Fixes #4627 (#4632)
* Backend - Caching - Fixed deployer failure on Kubernetes v1.16+

The sideEffects field field became required in v1 version of the resource https://github.com/kubernetes/kubernetes/pull/79549

Also adding failurePolicy: Ignore, because the default value has changed to Fail in v1.16.

These changes are not needed for v1beta1, but I still add them for those cases as well for consistency.

* The admissionReviewVersions field became required in the v1 API in v1.16

See https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#request
2020-10-19 20:18:08 -07:00
Niklas Hansson 2317015085
feat(backend): allow configuring if default version should be updated when uploading new pipeline version. Fixes #4049 (#4476)
* update to fetch remote

* missed to add the description

* fixed merge conflict

* initial work

* fixed test and bug

* updated python client

* clean up

* clean up

* added config default

* fixed bug in API

* moved config  value

* reverted to load from config

* clean up

* Update _client.py

* removed unecessary function and updated after feedback

* missed to save pipeline.proto

* updated the last parts after feedback

* reverted back to use string and env variable

* updated typo

* fix typo in path

* clean up

* removed option in api

* clean up python part

* typo, cant run test locally

* clean up, problems with local env

* clean up missing differences

* reverted proto files

* further clean up

* clean up

* updated after feedback

* Added tests

* error in my defer statement

* Updated the test
2020-10-19 02:08:14 -07:00
Niklas Hansson 5742991c1a
feat(API): exposing api for setting the default version of pipeline. Fixes #4049 (#4406)
* initial work on exposing the default version of pipeline

* update description

* added missing files

* updated api build, unsure if this is correct ...

* updated after feedback

* clean up

* remove empty line

* started to make the integration test

* added integreation test

* fixed build and feedback

* updated the tests

* Updated the test

* new test

* typo

* updated the pipeline default

* updated the pipeline version

* formatting

* error in comparison
2020-10-15 16:19:25 -07:00
Alexey Volkov 58584e1d1f
fix(cache): Cache deployer - Using the same kubectl version as the server (#4525)
* Cache deployer - Using the same kubectl version as the server

Fixes https://github.com/kubeflow/pipelines/issues/4505

* Changed the PATH precedence

* Unquoted the jq output

* Fixed the curl options
2020-09-29 22:47:24 -07:00
Alexey Volkov 24217ff4ab
fix(backend): Cache Deployer - Fixed grep call (#4568) 2020-09-29 21:55:24 -07:00
Yihong Wang 06bf42998e
chore(backend): remove unused import in .proto (#4448)
nit: Clean up unused `import`s in proto files.
Since no actually code change, code gen is skipped.
2020-09-22 00:28:47 -07:00
Yuan (Bob) Gong d91a0c9da1
chore(release): bump version to 1.0.1 on master branch (#4492)
* chore(release): bump version to 1.0.1 on master branch

* remove rc changelog
2020-09-14 01:28:58 -07:00
Yuan (Bob) Gong 29a6aaa4e4
fix(backend): persistence agent - workflow not found error should be a permanent error (#4486)
* fix(backend): workflow not found error should be permanent

* failing test case

* Fix logic

* fix another case

* Switched to not found error

* not found error should be permanent
2020-09-11 12:42:09 -07:00
Yuan (Bob) Gong c64820ad25
build: remove our own tools to comply with pypi package licenses. Fixes #4461 (#4462)
* improve license.sh logging

* build: remove our own scripts to comply with pypi package licenses

* Remove unneeded packages when we do not need to handle licensing ourselves
2020-09-03 17:41:43 -07:00
Yuan (Bob) Gong b1dcedc6bd
test(backend): Fix upgrade_test flakiness. Fixes #4426 (#4460) 2020-09-02 23:55:41 -07:00
Erhan Kesken 1f2d417e31
fix(backend): skip reporting native Argo workflows which do not have Run ID label. Fixes #3584 (#4438)
Fixes 3584.

For clusters with existing native Argo Workflows, ml-pipeline logs were dirtied
with unneccessary stack traces due to "missing Run ID label" situation.

Made persistenceagent skip the workflow if it misses the Run ID label, and
added workflow name to previous error message in apiserver side.
2020-09-02 02:57:13 -07:00
Alexey Volkov 6b54eecf28
fix(backend): Caching - Only send cache-enabled pods to the caching webhook (#4429)
* Backend - Caching - Only send cache-enabled pods to the caching webhook

The caching webhook already checks whether the pod is cache-enabled, but this change makes the check happen sooner - even before calling the webhook.
This way the webhook cannot possibly affect any non-KFP pods.

This feature requires API v1 and Kubernetes v1.15, so we use it conditionally.

* Support filtering on Kubernetes v1.15 as well
2020-09-02 02:09:25 -07:00
Daehee Kim cd9c9ff2b2
fix(backend): add `MaxCallRecvMsgSize(math.MaxInt32)` to proxy server (#4402) 2020-09-02 02:09:17 -07:00
frozeNinK 32c9c2ac86
fix(backend): fix typo in reference key type (#4376)
* Fix typo in reference key type

* well...
2020-09-02 02:09:09 -07:00
Erhan Kesken ec733c9a42
fix(backend): prevent seg fault if workflow manifest is deleted. Fixes #4389 (#4439)
Fixes #4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

It was same in CreateJob calls.

Scenario described in #4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue #4389
still need to be addressed.
2020-09-02 00:13:06 -07:00
Yuan (Bob) Gong 7840d30274
test(backend): fix TestJobApis test is flaky. Fixes #4419 (#4451) 2020-09-01 22:13:06 -07:00
Yuan (Bob) Gong 3fc32ace8b
build(backend): fix "ModuleNotFoundError: No module named setuptools._distutils". Part of #4443 (#4445) 2020-09-02 06:35:25 +08:00
Alexey Volkov ada18bc6e6
fix(backend): Caching - Reduced the cache webhook timeout (#4428)
Reduced the timeout from 30 seconds to 5.
This should not be needed as most users tell us that pods work even when the cache service is not available. But there was at least one customer who experienced timeout failures when creating pods after the service was deleted, but not the webhook config.
2020-08-28 05:16:53 -07:00
Alexey Volkov 40353bf6ac
fix(cache): Cache executions with no outputs. Fixes #3507 (#3808)
Fixes https://github.com/kubeflow/pipelines/issues/3507
2020-08-20 17:37:39 -07:00
jingzhang36 ec59846718
chore(backend): use go build instead of bazel build for api server's docker file (#4373)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* switch from bazel build to go build
2020-08-20 00:33:23 -07:00
Alexey Volkov 6d4c6632d3
chore(backend): Visualization: Using the correct Tensorflow image to prevent build time-outs (#4353)
Decrease failures like the following: https://00e9e64bac96f3b5ceb1c3cfb1005d0a00d4d7f2cd03a990dc-apidata.googleusercontent.com/download/storage/v1/b/363997316495.cloudbuild-logs.googleusercontent.com/o/log-4b63e78a-ea23-4c52-b3f8-adcfa52f7dce.txt?qk=AD5uMEtFzOOoFfudNzxaUuDUFdIaxG_zPSxzo-bRUvvcNMZcfl4hOOouQsL6l6WObtQzXTlxCdNKGYS9eD1oRPD6QkBD7Tb5eTO2s2LECqJUkBMAiHyiaJYqyWHbNvKk6l3l5wjGrQx2ToBQkBTCJhNO_lttWaQUvTDN1acZU85s1K7X51e3Z7sB_hBMDKdHdjZPlNJKaHaQvUshQJRlE_4HMA40sEhECxMWb8xncijYL1Wijnppj1Y6f9ANFcqR2DsNqeC-fLVqPYpegj9idSVBW_z23iRZRCjzCOXzk-LLkkbe-O1XK_NCIeaUrzoDNll8hiI2tJ4Yy_ozVYXj4UNtMV4sxAR72i6w7dy2u-Q1U6I1ocwSCSH84UV0LPzPmo3c9w6vacjJVLm4DnQNBDc6RfNgojutNGXL8_hn8TXOVgXKMRs4SbbLr216QpQlXn9McG84GOsWM-V1ayyS5aUpW-4DubF3-aeGNKgyvMwnTBqVDTHoi1NflKXxRCDDAW1i43q4hbeiq4LJXRl_5OqHl1PEC8gAGgLKNDy1jXMp3NM6cJqF3VpPzNEgS_y3I1DqjyFavkqvFxYMM8FX6cOKf-rBteF8TMzu3kAlGAc-E4QmQSSP_ygjb5ISdjkQ7bwHAit0BmeBej7ldsPAV8web8xIu3akvMlH0ZFuX-m807cYdJZU26-OlPwafVS5J_iyg--GNgQTVcKydAcjf590pmUuRLONsic5rEg67zBw_YdWR5bQoDRr8XSyHsg8EjN7aIjXudE06VN_eQOCxG940MH-zjJKhO5tIZTdpJT4-mcAw8x7Pg4H3Vihx-MVhSWGUfRJIJXASNgYEBJ5DEH1wN41zadYnBsc9r6OuPwfI7Z96zhSbghFz60KnRZbIuMVXow8yTlP&isca=1
2020-08-19 23:25:22 -07:00
jingzhang36 1747f8f258
backend: add prometheus metrics collection to KFP server. (#4354)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* prometheus configs; basic metrics in pipeline server to collect
prometheus metrics

* make version consistent
s Please enter the commit message for your changes. Lines starting

* check if we gc workflows

* add prom deps

* upload counts

* remove non-code changes

* more metrics

* upload server metrics guarded by flag

* todo for a flag

* fix test

* fix tests

* fix tests
2020-08-15 07:40:17 -07:00
frozeNinK 384afac4b1
fix(backend): improve forward / backward compatibility on db status table (#4351) 2020-08-13 00:26:13 -07:00
Alexey Volkov 89cbebb003
chore(bakend): Visualization - Do not fail on extra licenses (#4355) 2020-08-12 17:00:15 -07:00
frozeNinK 514120167e
fix(backend): improve forward / backward compatibility on experiment table (#4349) 2020-08-12 06:55:45 -07:00
Alexey Volkov f35462fdb3
feat(cache): Explicitly specifying which attributes affect the cache key (#4076)
* Backend - Cache - Explicitly specifying which attributes affect the cache key

Fixes https://github.com/kubeflow/pipelines/issues/4038
Fixes https://github.com/kubeflow/pipelines/issues/3972

* Fixed the test

* Added comments to intersectStructureWithSkeleton

* Fixed the tests incorrectly modifying a global variable

* Added the test verifying the template cleanup
2020-08-12 03:43:45 -07:00
myonlyzzy bb21597d43
fix(backend): logs error when failing to init mysql. Fixes #4334 (#4335) 2020-08-11 22:21:44 -07:00
jingzhang36 b6c2c2aee6
doc: readme for how to auto-generate api reference from the backend api definitions (#4348)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* add a quick guide on how to generate api reference from kfp api definition

* remove trailing lines
2020-08-10 21:12:16 -07:00
Yuan (Bob) Gong 01a79980f6
fix(backend): reduce confusing ReadArtifact errors for metrics in api server. Fixes #3699 (#4338) 2020-08-09 21:26:19 -07:00
Alexey Volkov fe77c197d1
fix(backend): Backend - Cache - Fixed reinstallation. Fixes #4299 (#4320)
* Backend - Cache - Fixed reinstallation by adding missing roles

* Stop ignoring the deletion errors

* Added patch permission as well

It should not be triggered, but might be useful in the future.
2020-08-04 18:48:28 -07:00
Gabriele Santomaggio d4aabd15b1
chore(backend): fixes typo empty space (#4318)
Fix empty space on the script, it could raise an error when use
different shell as zsh.
2020-08-04 08:00:20 -07:00
Yuan (Bob) Gong 335323353f
chore: pin visualization server python dependencies. Fixes #3078 (#4310)
* build: fix visualization server build failure by adding missing licenses for new deps

* wip

* chore: pin visualization server python dependencies

* updates

* update licenses
2020-08-03 03:55:40 -07:00
Yuan (Bob) Gong 2a65eec1fa
build: fix visualization server build failure by adding missing licenses for new deps (#4309)
* build: fix visualization server build failure by adding missing licenses for new deps

* download updated licenses
2020-08-02 20:39:39 -07:00
jingzhang36 9c6738fa80
feat(backend): sort by run metrics - step 3. Part of #3591 (#4251)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* sorting by run metrics is different from sorting by name, uuid, created at, etc. The lattre are direct field in listable object, the former is an element in an arrary-typed field in listable object. In other words, the latter are columns in table, the former is not.

* unit test: add sorting on metrics with both asc and desc order

* GetFieldValue in all models

* fix unit test

* whether to test in list_test. It's hacky when check mode == 'run'

* move model specific code to model; prevent model package depends on list package; let list package depends on modelpackage; marshal/unmarshal listable interface; include listable interface in token.

* some assumption on token's Model field

* fix the regular field checking logic

* add comment to help devs to use the new field

* add a validation check

* Listable object can be too large to be in token. So replace it with only
relevant fields taken out of it. In the future, if more fields in
Listable object become relevant, manually add it to token

* matches func update
2020-07-31 02:41:06 -07:00
jingzhang36 d4d361626e
feat(backend): sort by run metrics - step 2 (#4235)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* sorting by run metrics is different from sorting by name, uuid, created at, etc. The lattre are direct field in listable object, the former is an element in an arrary-typed field in listable object. In other words, the latter are columns in table, the former is not.

* unit test: add sorting on metrics with both asc and desc order

* list is generic. model specific test is put to run_store_test.go
2020-07-20 23:27:14 -07:00
Yuan (Bob) Gong 988f5b02e4
chore(release): bump version to 1.0.0 on master branch (#4249) 2020-07-20 02:04:51 -07:00
Alexey Volkov 1990588404
fix(backend): Metadata Writer - Fixed regression with artifact type retrieval. Fixes #3971 (#4231)
* Metadata Writer - Fixed regression with artifact type retrieval

The DSL compiler has changed the output name sanitization rules, so we should change them here accordingly.

* Added the code link
2020-07-17 03:53:01 -07:00
Yao Xiao d3d4dcbbc2
fix(backend): fixes useless error message when visualization-server is not accessible. Fixes #4157 (#4201)
Add another unit test to handle ServiceHostNotExistError
2020-07-16 17:05:00 -07:00
jingzhang36 a96e8fe94e
feat(backend): sorting on run metrics - step 1 (#4203)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* metrics as the outermost

* columns swap
2020-07-13 17:33:21 -07:00
Yuan (Bob) Gong e4f4250fa8
fix(cache): cache-deployer should check both secret and config (#4186) 2020-07-09 02:38:02 -07:00
Yuan (Bob) Gong 75336f7395
fix(cache): Fix cache deployer not regenerating secrets when secret not present (#4171) 2020-07-08 09:59:09 -07:00
jingzhang36 cf29c61e49
fix(backend): fix the google-api-core to 1.16.0 for backend visualization server. (#4158)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* fix google api core to 1.16.0 until it gets newer release than 1.21.0

* add comments
2020-07-08 07:30:07 +08:00
Alexey Volkov bd0f4d23b9
chore(backend): Only compiling the preloaded samples. Fixes #4117 (#4118)
* Backend - Only compiling the preloaded samples

Fixes https://github.com/kubeflow/pipelines/issues/4117

* Fixed the paths

* Removed -o pipefail for now since sh does not support it

* Fixed the quotes

* Removed the __future__ imports

Python 2 is no longer supported.
The annotations cause compilation problems:
```
  File "/samples/core/iris/iris.py", line 18
    from __future__ import absolute_import
    ^
SyntaxError: from __future__ imports must occur at the beginning of the file
```
2020-07-06 20:03:57 -07:00
jingzhang36 ce51c591f3
fix: increase TFX version from 0.20.2 to 0.22.0. Fixes #4084, fixes #4114 (#4133)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* tfx 0.21.2 -> 0.22.2

* tfx 0.20.2 -> 0.22.0

* update requirements.txt
2020-07-02 00:40:46 -07:00
Yuan (Bob) Gong be13a819f6
chore: fix kjd/idna license location change (#4127) 2020-07-02 10:30:34 +08:00
Yuan (Bob) Gong 79e0ee2b49
chore: remove inactive reviewers (#4111)
* Update OWNERS

* Update OWNERS

* Update OWNERS

* Update OWNERS

* Update OWNERS

* Update OWNERS

* Update OWNERS

* Update OWNERS

* Update OWNERS
2020-06-30 19:10:06 -07:00
frozeNinK 8a2d11c96a
feat(backend): Make number of persistence worker goroutine configurable (#3904)
* Make number of persistence worker configurable

* address comments

* address comments

* address comments
2020-06-29 21:37:58 -07:00
Yuan (Bob) Gong 042ff09100
fix(backend): allow empty userid header prefix. Fixes #4091 (4098) 2020-06-28 18:08:14 -07:00
Lida Li 91f08c4849
Validate resourcekey to avoid apiserver being panic for invalid inputs (#3999) 2020-06-24 11:42:38 -07:00
Niklas Hansson 0f83eece66
chore(backend): mention the Bazel version requirements in the README.md (#3969)
* Update README.md

* Update backend/README.md
2020-06-24 16:58:07 +08:00
Yuan (Bob) Gong f456ee9768
doc(sdk/client): fix kfp-server-api py client's docstring format (#4047)
* Pull templates from upstream 4.3.1

* update templates according to OpenAPITools/openapi-generator/pull/6391

* regenerate python client
2020-06-23 06:21:45 -07:00
jingzhang36 6fdf03a164
[Backend] Bug fix: applying filter in listing versions (#4052)
* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* add filter for listing versions

* add another filter in test

* comment revision
2020-06-23 03:07:41 -07:00
Renmin 7f39f18db7
better native-keras based sample (#3900)
* move seq

* for test

* updated test
2020-06-22 02:24:40 -07:00
Alexey Volkov 0417f13dce
Metadata Writer - Added timeouts (#4037) 2020-06-22 01:40:39 -07:00
dushyanthsc 24423ffa5c
Metadata-Writer: Updates metadata writer to use mlmd 0.22.0 (#4027)
This change updates Metadata writer to use MLMD library version 0.22.0
2020-06-18 18:29:11 -07:00
jingzhang36 8553497c3c
Reduce ttl of persisted final workflow to 1 day (#4005)
* reduce ttl of pesisted final workflow to 1 day

* add comment

* enable pagination when expanding experiment in both the home page and the archive page

* Revert "enable pagination when expanding experiment in both the home page and the archive page"

This reverts commit 5b672739dd.

* Address comments
2020-06-18 00:22:06 -07:00
Alexey Volkov 4f5a7f0c20
Metadata Writer - Stopped using artifact properties (#4004) 2020-06-16 21:40:39 -07:00
Alexey Volkov 8f8ac52c34
Cache - Deployer should check whether the secret is installed (#3992)
Fixes https://github.com/kubeflow/pipelines/issues/3815
2020-06-15 23:32:03 -07:00
Chen Sun fcd2559b2c
[Backend][Mutli-user] Allow shared read in the special multi-user mode. (#3858)
* Allow shared read in the special multi-user mode.

* remove shared read on list functions until it's comfirmed needed.
2020-06-14 00:37:55 -07:00
jingzhang36 2fa6e2b7f6
[Backend] Filter run on status (#3959)
* filter run on status

* unit test

* add assertion

* one more test for not equal

* verify generated args as well

* assertion
2020-06-11 01:32:56 -07:00
Yuan (Bob) Gong c6ac5e0b1f
[Python Client] Clean up generated python client template to facilitate version bump (#3937)
* Remove version from generated python client header comment

* Regenerate client

* Bump to 1.0.0-dev.2 to showcase version bump diff
2020-06-09 18:20:04 -07:00
jingzhang36 5f9e56a744
Only pending or running workflows are considered not-final (#3940)
* only pending or running workflows are considered not-final

* rephrase comment
2020-06-09 06:05:18 -07:00