Commit Graph

116 Commits

Author SHA1 Message Date
jingzhang36 7aaecb1501 Add necessary data types to api and database to support pipeline version. (#1873)
* Add necessary data types/tables for pipeline version. Mostly based
on Yang's branch at https://github.com/IronPan/pipelines/tree/kfpci/.
Backward compatible.

* Modified comment

* Modify api converter according with new pipeline (version) definition

* Change pipeline_store for DefaultVersionId field

* Add pipeline spec to pipeline version

* fix model converter

* fix a comment

* Add foreign key, pagination of list request, refactor code source

* Refactor code source

* Foreign key

* Change code source and package source type

* Fix ; separator

* Add versions table and modify existing pipeline apis

* Remove api pipeline defintiion change and leave it for later PR

* Add comment

* Make schema changing and data backfilling a single transaction

* Tolerate null default version id in code

* fix status

* Revise delete pipeline func

* Use raw query to migrate data

* No need to update versions status

* rename and minor changes

* accidentally removed a where clause

* Fix a model name prefix

* Refine comments

* Revise if condition

* Address comments

* address more comments

* Rearrange pipeline and version related parts inside CreatePipeline, to make them more separate.

* Add package url to pipeline version. Required when calling CreatePipelineVersionRequest

* Single code source url; remove pipeline id as sorting field; reformat

* resolve remote branch and local branch diff

* remove unused func

* Remove an empty line
2019-09-25 23:59:07 -07:00
Ajay Gopinathan d5316f0754 Use latest Tensorflow image for Tensorboard. (#2140)
* Use latest Tensorflow image for Tensorboard.

* Fix tests
2019-09-20 21:43:32 -07:00
Ning a855c5655b remove tfx-taxi sample in favor of the tfx oss sample (#2160) 2019-09-18 22:59:06 -07:00
xaniasd 726551b13d configure db host and port from from config file (#1940)
* configure db host and port from from config file

* Update client_manager.go

* Update client_manager.go
2019-09-18 02:19:29 -07:00
Kirin Patel 3726eb216c Fix bug where source and variables are not accessible to visualization (#2012)
* Fix bug where source and variables are not accessible to visualization

* Updated snapshot

* Removed test_generate_test_visualization_html_from_notebook

* Added test cases to ensure roc_curve, table, and tfdv visualizations can be generated

* Made test requirements identical to normal requirements

* Fixed source links

* Updated test_server.py to use table visualization

* Update .travis.yml

* Add logging to debug travis tests

* Add tensorflow back to requirements.txt

* Updated .travis.yml and requirements.txt, also added comment that specifies required libraries to run tests

* Testing TFDV visualization with different source

* Changed remote paths to be local due to timeout issues

* Removed visualization tests due to continued failure

* Reverted .gitignore and removed tensorflow from text_exporter pip install command

* Moved where dependencies are installed in .travis.yaml

* Revert "Made test requirements identical to normal requirements"

This reverts commit 7f11c43c44.

* Added pip install requirements to .travis file

* Removed new unit test and requirements.txt install

* Cleaned up tests and re-added test.py predefined visualization

* Cleanup
2019-09-10 08:34:53 -07:00
Kirin Patel aa8f0a2a6c Fix python syntax of TFMA visualization (#1972)
* Fix python syntax

* Update tfma.py
2019-09-05 14:44:57 -07:00
Kirin Patel 41b394b045 Add e2e visualization tests (#1981)
* Created visualization_api_test.go

* Updated BUILD.bazel files

* Removed clean_up from e2e test

* Revert "Removed clean_up from e2e test"

This reverts commit 82fd4f5a00.

* Update e2e tests to build visualizationserver and viewer-crd

* Fix bug where wrong image is set

* Fixed incorrect image names

* Fixed additional instance of incorrect image names
2019-08-30 13:54:10 -07:00
Kirin Patel 53f516e0f9 Changed isVisualizationServiceAlive implementation (#2004) 2019-08-30 12:48:30 -07:00
Kirin Patel 2eac092800 Improve visualization server docker image (#2003)
* Updated Dockerfile.visualization to take advantage of caching and switched base image

* Removed tensorflow from requirements.txt and added new package to third_party_licenses.csv
2019-08-30 12:48:23 -07:00
Kirin Patel 3cbbd87021 Fix ROC Curve visualization argument placeholder (#2002)
* Updated ROC curve argument placeholder

* Updated snapshot

* Fixed method of obtaining values from variables dict for ROC curve visualization
2019-08-30 12:48:16 -07:00
Kirin Patel 3cc1e01277 Added README.md for Python based visualizations (#1853)
* Added developer_guide.md for Python based visualizations

* Changed md file name to be README and added link to documentation page

* Updated README.md to match syntax of #1878

* Added architecture and known limitations sections to documentation

* Addressed PR comments

* Address offline feedback from @SinaChavoshi

* Removed limitation

#1951 changes how arguments are passed from the API server to the Python service. This now allows for multi-line comment support.

* Addressed PR comments
2019-08-30 11:06:57 -07:00
Kirin Patel 69d0328385 Remove stdout/stderr from predefined visualization (#1976)
* Add new template files

* Add statement to change template used depending on type of visualization

Now, non-custom visualizations will not show stdout and stderr messages to a user.

* Removed new template files

* Removed unused custom.css style file

* Added simpler way to hide logging for non-custom visualizations

* Set hide_logging based on if a cell is based on a file or custom code

* Updated exporter unit tests

* Removed deprecated logic to set template type based on visualization type

* Fixed test_create_cell_from_args_with_multiple_args and removed test.py due to changes made to create_cell_from_file function
2019-08-30 10:11:55 -07:00
IronPan bbfe5e09cc move pipeline runner service account to backend (#1988)
* move pipeline runner service account to backend

* revert swf change

* revert swf change

* update tests

* update tests

* Update compiler.py

revert python change for backward compatibility

* Update compiler.py

revert python change for backward compatibility
2019-08-29 16:03:14 -07:00
Kirin Patel c642889a47 Added load_datatables function call to table visualization (#1974) 2019-08-28 11:17:13 -07:00
Kirin Patel bd7eb77f9e Fix support for custom visualizations (#1951)
* Change how arguments are checked and provided for Python service

* Arguments no longer require a source if the type is specified to be custom
* If no source is provided for a custom visualization, it will no longer be provided to the Python service

* Added unit test to test that an empty source can be provided alongside custom visualizations

* Added support for custom code to be used to generate visualizations within Python service

* Added unit tests to cover support of custom visualizations

* Fixed logic that handles source addition and validation in API

* Formatted visualization_server_test.go

* Moved self.maxDiff to setup function

* Removed unused import

* Simplified how arguments are passed from API to Python service

Arguments are no longer manually converted to command line arguments to be passed to the Python service. Instead, they are converted to x-www-form-urlencoded arguments which is sent to the Python service and then converted to a dictionary by the Python service.

* Made @staticmethods private functions
2019-08-27 19:25:11 -07:00
Kirin Patel 677ffe2dc6 Exclude visualization types from flake8 testing (#1925)
* Moved visualization types to types subdirectory

* Updated flake8 test to ignore types subdirectory
2019-08-23 16:26:28 -07:00
Kirin Patel 3c8952e6bf Add TFDV, TFMA, and Table visualization support for Python based visualizations (#1898)
* Added table and tfdv visualization

Also fixed issue surrounding ApiVisualizationType enum

* Fixed table visualization

* Removed byte limit
* Fixed issue where headers would not properly be applied
* Fixed issue where table would not be intractable

* Updated table visualizaiton to reflect changes made to dependency injection

* Fixed bug where checking if headers is provided to table visualizations could crash visualization

* Added TFMA visualization

* Updated new visualizations to match syntax of #1878

* Updated test snapshots to account for TFMA visualization

* Small if statement synax changes

* Add flake8 noqa comments to table.py and tfma.py
2019-08-22 13:57:20 -07:00
Kirin Patel 56160f1c83 Added support for environment specified kernel timeout (#1920) 2019-08-22 09:26:33 -07:00
Kirin Patel 8c3d6fe121 Add visualization-server service to lightweight deployment (#1844)
* Add visualization-server service to lightweight deployment

* Addressed PR suggestions

* Added field to determine if visualization service is active and fixed unit tests for visualization_server.go

* Additional small fixes

* port change from 88888 -> 8888
* version change from 0.1.15 -> 0.1.26
* removed visualization-server from base/kustomization.yaml

* Fixed visualization_server_test.go to reflect new changes

* Changed implementation to be fail fast

* Changed host name to be constant provided by environment

* Added retry and extracted isVisualizationServiceAlive logic to function

* Fixed deployment.yaml file

* Fixed serviceURL configuration issuse

serviceURL is now properly obtained from the environment, the service ip address and port are used rather than service name and namespace

* Added log message to indicate when visualization service is unreachable

* Addressed PR comments

* Removed _HTTP
2019-08-21 18:30:33 -07:00
Christian Clauss 8e1e823139 Lint Python code for undefined names (#1721)
* Lint Python code for undefined names

* Lint Python code for undefined names

* Exclude tfdv.py to workaround an overzealous pytest

* Fixup for tfdv.py

* Fixup for tfdv.py

* Fixup for tfdv.py
2019-08-21 15:04:31 -07:00
dushyanthsc 23993486c5 apiserver: Remove TFX output artifact recording to metadatastore (#1904) 2019-08-21 13:44:31 -07:00
IronPan 3b7340f258 Change the type of resource reference payload column (#1905)
Gorm doesn't automatically change the type of a column. This changes introduced a column type change which might not be effective for an existing cluster doing upgrade. 
4e43750c9d (diff-c4afa92d7e54eecff0a482cf57490aa8R40)

/assign @hongye-sun
2019-08-21 12:28:31 -07:00
Kirin Patel a4991fd81a Enable error propagation from nbconvert to frontend (#1909) 2019-08-21 11:56:31 -07:00
IronPan 0864fafbbc Use single part as default (#1893)
The data stored in artifact storage are usually small. Using multi-part is not strictly a requirement. 
Change the default to true to better support more platform out of box.
2019-08-20 14:09:19 -07:00
Andy Wei 60fd70c093 Let backend apiserver mysql dbname configurable (#1714)
* let mysql dbname configurable

* solve conflict

* move declaration out to fix scope
2019-08-20 00:33:33 -07:00
Kirin Patel ea67c998b6 Change how Variables are Provided to Visualizations (#1754)
* Changed way visualization variables are passed from request to NotebookNode

Visualization variables are now saved to a json file and loaded by a NotebookNode upon execution.

* Updated roc_curve visualization to reflect changes made to dependency injection

* Fixed bug where checking if is_generated is provided to roc_curve visualization would crash visualizaiton

Also changed ' -> "

* Changed text_exporter to always sort variables by key for testing

* Addressed PR suggestions
2019-08-15 16:18:35 -07:00
IronPan 39e5840f2f Add retry button in Pipeline UI (#1782)
* add retry button

* add retry button

* add retry button

* address comments

* fix tests

* fix tests

* update image

* Update StatusUtils.test.tsx

* Update RunDetails.test.tsx

* Update Buttons.ts

* update test

* update frontend

* update

* update

* addrerss comments

* update test
2019-08-15 11:04:34 -07:00
IronPan f1c9594f7d Propagate pipeline name in pipeline spec (#1842)
* update

* add pipeline name

* propagate name

* propagate name

* fix

* update test
2019-08-14 23:30:32 -07:00
IronPan 1e9c14f7fa
Improve sql efficiency for getting the run (#1835) 2019-08-14 17:19:06 -07:00
IronPan c6e9d47418 Create composite indexes for [ResourceType, ReferenceUUID, ReferenceType] (#1836)
It would improve the list runs call which contains filtering on [ResourceType, ReferenceUUID, ReferenceType]
We've seen cases list runs take long to run when resource_reference table is large. 

```
SELECT
   subq.*,
   CONCAT("[", GROUP_CONCAT(r.Payload SEPARATOR ", "), "]") AS refs 
FROM
   (
      SELECT
         rd.*,
         CONCAT("[", GROUP_CONCAT(m.Payload SEPARATOR ", "), "]") AS metrics 
      FROM
         (
            SELECT
               UUID,
               DisplayName,
               Name,
               StorageState,
               Namespace,
               Description,
               CreatedAtInSec,
               ScheduledAtInSec,
               FinishedAtInSec,
               Conditions,
               PipelineId,
               PipelineSpecManifest,
               WorkflowSpecManifest,
               Parameters,
               pipelineRuntimeManifest,
               WorkflowRuntimeManifest 
            FROM
               run_details 
            WHERE
               UUID in 
               (
                  SELECT
                     ResourceUUID 
                  FROM
                     resource_references as rf 
                  WHERE
                     (
                        rf.ResourceType = 'Run' 
                        AND rf.ReferenceUUID = '488b0263-f4ee-4398-b7dc-768ffe967372' 
                        AND rf.ReferenceType = 'Experiment'
                     )
               )
               AND StorageState <> 'STORAGESTATE_ARCHIVED' 
            ORDER BY
               CreatedAtInSec DESC,
               UUID DESC LIMIT 6
         )
         AS rd 
         LEFT JOIN
            run_metrics AS m 
            ON rd.UUID = m.RunUUID 
      GROUP BY
         rd.UUID
   )
   AS subq 
   LEFT JOIN
      (
         select
            * 
         from
            resource_references 
         where
            ResourceType = 'Run'
      )
      AS r 
      ON subq.UUID = r.ResourceUUID 
GROUP BY
   subq.UUID 
ORDER BY
   CreatedAtInSec DESC,
   UUID DESC

```

/assign @hongye-sun
2019-08-14 13:46:35 -07:00
IronPan 4e43750c9d
add reference name to resource reference API proto (#1781)
* add resource reference

* fix tests

* fix tests

* fix e2e test

* fix e2e

* fix test

* update api requirement

* fix tests

* Update job_api_test.go

* Update run_api_test.go

* Update setup.py

* Update deploy-kubeflow.sh

* fix tests

* Update deploy-kubeflow.sh
2019-08-13 03:15:11 -07:00
IronPan 6189681436
Garbage collect the completed workflow after persisted to database (#1802)
* garbage collect the completed workflow

* add tests

* fix tests

* fix tests

* update e2e test

* update logic

* update logic

* update logic

* fix tests

* fix tests

* update
2019-08-12 23:53:18 -07:00
IronPan 5793cb2e1e Fix the broken sample path in API (#1805)
* Fix the broken sample path in API

Related change b476a848d9

/hold
Need investigate why test doesn't catch it. 

/assign @numerology @hongye-sun

* Update sample_config.json
2019-08-11 00:35:11 -07:00
IronPan 957990c78e
fix update bug (#1765) 2019-08-08 13:00:34 -07:00
Kirin Patel 48173ad776 Add Visualization Server to Cloud Build yaml Files (#1738)
* Added visualization server to Cloud Build yaml files

* Add pymongo to third_party_licenses.csv
2019-08-08 00:03:21 -07:00
IronPan a9602fbc3f
Add API to rerun the pipeline (#1720)
* add resubmit proto

* add compiled code

* fix

* add resubmit proto

* add

* refactor

* update builder

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* add test

* add test

* add test

* add test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* address comments

* add comments

* change request body def

* recompile api

* retry instead of resubmit

* update test

* update test

* fix tests

* fix tests

* fix tests

* robust retry

* robust retry

* robust retry

* robust retry

* robust retry

* robust retry

* robust retry

* robust retry

* robust retry

* add error handling

* reorder the call

* remove logic to update the database entry

* add mock

* add tests for rerousrce manager

* update error handling logic

* fix tests

* address comments
2019-08-07 13:59:06 -07:00
Kirin Patel fa1abde7f6 Rename InputPath -> Source for Visualization API definition (#1717)
* InputPath -> Source

* Changed name of data path/pattern variable from InputPath to Source to improve consistency with current visualization method
* Updated unit tests to reflect name change
* Regenerated swagger definitions to reflect name change

* Readded test that was removed with previous commit

It was deleted by mistake
2019-08-05 10:55:50 -07:00
nirsagi 44f81980ed Support Single part PutFile (#1713)
* Add env var for single part support.

AddFile will have the option to send single part. for backend which doesn't support multipart.
in order to use it change ObjectStoreConfig.Disable.Multipart value to true

* few changes

* remove redundent import

* remove newlines
2019-08-02 23:07:52 -07:00
Riley Bauer 9a48d297dd Allows creation of jobs without experiments (#1702)
* Allows creation of jobs without experiments

Such jobs will be placed within the Default experiment

* Update job server test
2019-08-02 13:09:51 -07:00
Ning e8a6feb229
Restructure samples (#1710)
* restructure samples
* update apiserver and sample test for the new location
2019-08-01 17:31:37 -07:00
Kirin Patel 65233ffa3b Add code for python visualization service (#1651)
* Setup initial server with roc_curve visualization

* Created Dockerfile.visualization

* Fixed import issue

* Changed implementation of generate_html_from_notebook to allow template type to be specified

* Added tfdv.py

* Added unit tests for exporter.py

* Deleted __init__.py

* visualizations/ -> visualization/

* Added requirements.txt and updated Dockerfile.visualization to use it

* Updated .travis.yml to run python visualization unit tests

* Fixed travis file path issue

* Continued testing to fix travis test issues

* Removed jupyter from pip3 install

Previously included to ensure python3 kernel was accessible to jupyter_client.

* Updated requirements.txt to included ipykernel

* Removed maxDiff limit for all python tests

* Sorted keys within args dictionary to ensure tests do not fail due to dictionary order

* Created requirements-test.txt

* Added input_path argument support for python service

Also adds check for missing input_path argument and returns 400 error if argument is missing.

* Updated Copyright in Dockerfile.visualization

* Updated snapshot to include all tests

* Added types, additional comments, and TemplateType enum

Also made additional style changes

* Formatted template files

* Addressed most feedback made by @kevinbache

* Revert "Formatted template files"

This reverts commit a7afd7b8af. This was done due to issues faced by the templating engine.

* Fixed comment placement and switched os -> Path

* Changed way exporter is implemented to use importlib

* Reverted to str.format due to python comparability issue

Python 3.6 introduced support for f-stringsl, this results in the tests failing when run in a python 3.5 environment

* Added unit tests for tornado web server

* Added license script for open source compliance

* Added line between file comment and license to match exporter.py

* Updated server structure

* Created Exporter class
* Introduced ability to specify visualization timeout (default is 100 seconds)
* Added more comments
* Broke up post function in VisualizationHandler to call multiple function rather than handling all logic within post function
* Updated imports
* Updated tests

* Addressed additional feedback from @kevinbache

* Fixed snapshot for test_exporter

* Comments -> Docstring Comments and other small fixes

* Fixed missing and incorrect typings
* shutdown_kernel is now private method of Exporter class

* Added missing and updated docstring comments in server.py

* Resolved latency issue with visualization server

Issue stemmed from a recreation of an exporter object per request, this was resolved by creating a global exporter.
2019-08-01 11:30:15 -07:00
Riley Bauer f63b57b66c Clears the workflow's name in GetWorkflowSpec and uses it for the GenerateName (#1689) 2019-07-26 14:09:58 -07:00
Kirin Patel efff09cc1f Add visualization server and unit tests for visualization server (#1647)
* Added visualization server

* Updated function names, added comments, and made serviceURL a property of VisualizationService

* Added unit tests for visualizaiton.go

* Updated BUILD.bazel

* Addressed PR comments made by @IronPan

* Wrapped input_path argument value in string when generating python arguments

* GenerateVisualizationFromRequest -> generateVisualizationFromRequest

* Addressed additional PR feedback from @IronPan

* Removed getArgumentsAsJSONFromRequest
* Removed createPythonArgumentsFromRequest
* Moved `fmt.Sprintf` to generateVisualizationFromRequest
* Updated tests to reflect changes

* Added missing word to comment
2019-07-24 18:33:51 -07:00
IronPan d2ead9e056
update persistent agence to only store the argo spec (#1634)
* update persistent agence to only store the argo spec

Currently when persisting the runs spawned from a job, PersistentAgent stores more information than needed into the pipeline manifest, and also miss the TypeMetadata. This resulted in storing lots of runtime information that's not needed.

Example WorkflowSpec
```
   "metadata":{
      "name":"fffpw4fh-2-2911767673",
      "namespace":"kubeflow",
      "selfLink":"/apis/argoproj.io/v1alpha1/namespaces/kubeflow/workflows/fffpw4fh-2-2911767673",
      "uid":"de23bd5c-a8c1-11e9-a176-42010a800233",
      "resourceVersion":"3975687",
      "generation":1,
      "creationTimestamp":"2019-07-17T18:37:02Z",
      "labels":{
         "scheduledworkflows.kubeflow.org/isOwnedByScheduledWorkflow":"true",
         "scheduledworkflows.kubeflow.org/scheduledWorkflowName":"fffpw4fh",
         "scheduledworkflows.kubeflow.org/workflowEpoch":"1563388612",
         "scheduledworkflows.kubeflow.org/workflowIndex":"2",
         "workflows.argoproj.io/phase":"Running"
      },
      "ownerReferences":[
         {
            "apiVersion":"kubeflow.org/v1beta1",
            "kind":"ScheduledWorkflow",
            "name":"fffpw4fh",
            "uid":"91039a28-a8c1-11e9-a176-42010a800233",
            "controller":true,
            "blockOwnerDeletion":true
         }
      ]
   },
```

* Update workflow_test.go

* Update workflow.go

* Update resource_manager_test.go

* Update resource_manager.go

* Update workflow_test.go
2019-07-22 16:28:10 -07:00
IronPan 44712b3c4a
Delete go CLI (#1592)
* delete go cli

* Delete cli_test.go

* Update BUILD.bazel

* Update .travis.yml

* Update BUILD.bazel
2019-07-19 01:11:25 -07:00
IronPan 974ab62552 propagate pwd (#1627) 2019-07-17 00:12:32 -07:00
jingzhang36 784c4f12b7 viewer controller is now namespaced so no need for cluster role (#1623)
* viewer controller is now namespaced so no need for cluster role

* our default namespaced install (kubeflow namespace) can also use Role instead of ClusterRole
2019-07-16 09:35:26 -07:00
jingzhang36 b957a9872c Viewer CRD controller running under namespace (#1562)
* Viewer CRD controller running under namespace

* Change docker file and add manifest deployment yaml to support the new flag namespace

* Change docker file to support new flag namespace for viewer crd controller

* Modify kustomization.yaml and namespaced-install.yaml

* Change file name from ml-pipeline-viewer-crd-deployment to ml-pipeline-viewer-crd-deployment-patch

* Fix typo

* Remove some duplicate configs in namespaced-install
2019-07-03 11:39:40 -07:00
Yaron Haviv 6f8d430e31 Apiserver s3 and MySQL env vars (#1455)
* making MySQL user and password configurable

* making minio parameters/secret configurable through env and file
2019-06-06 18:27:59 -07:00
Ajay Gopinathan 5f1b41171f Fix API package names and regenerate checked-in proto files. (#1404)
* Fix API package names and regenerate checked-in proto files. Also bump version of GRPC gateway used.

* Fix BUILD.bazel file for api as well.

* Update Bazel version
2019-05-30 11:26:28 -07:00