Commit Graph

995 Commits

Author SHA1 Message Date
William Black 59bb2c8f06
Add documentation with MkDocs (#268)
* Rename docs directory

* Add documentation dependencies

* Add documentation dependencies to `setup.py`

* Add javascripts

* Add CSS stylesheets

* Add TF logo

* Add readme as landing page

* Add basic mkdocs config

* Use correct index page

* Add images for index page

* Add install and getting started pages to navigation

* Add API docs

* Add docs workflow

* Remove deprecated code

* Check links with lychee in docs workflow

* Revert "Check links with lychee in docs workflow"

This reverts commit 21b34867eb.

* Add anomalies reference

* Fix bad links in docs

* Fix links that should be internal

* Run pre-commit
2025-06-23 16:03:32 -07:00
Andrew Fulton 34c32afeec
Remove zetasql (#272)
* adds linting configuration and workflow

* update error codes

* reformat linting ignore rules

* add patch to to ignore hooks

* run linter

* fix formatting error

* Remove ZetaSQL from workspace

* add import for re2.BUILD remove tfx_bsl sql_util dependencies

* fix test call

* don't test 3.9 right now

* fix tests

* revert tfx-bsl repo and branch to official

---------

Co-authored-by: smokestacklightnin <125844868+smokestacklightnin@users.noreply.github.com>
Co-authored-by: pdmurray <peynmurray@gmail.com>
2025-06-23 09:47:44 -07:00
Andrew Fulton 748bb7bfc4
CI Linting (#267)
* adds linting configuration and workflow

* update error codes

* reformat linting ignore rules

* add patch to to ignore hooks

* run linter

* fix formatting error
2025-06-20 12:55:17 -07:00
Andrew Fulton 2f54c47480
Testing/Build Workflows (#266)
* Add build workflow via docker (#259)

* Add build workflow via docker

* rename docker-compose to docker compose

* add twine check and upload to PyPi

* add workflow_dispatch

* install twine before twine check

* add testing workflow

* single python

* trigger

* install in build job

* install pytest

* install test dependencies

* add xfail to tests

* add reusable workflows and add pr number in xfail

* fix composite action

* add more xfails

* xfail top_k_uniques_stats_generator_test.py

* xfails in partitioned_stats_generator_test.py

* more xfails

* add missing imports

* fix extra decorators

* more xfails

* Fix TAP and Kokoro tests caused by NumPy v2 migration.
1. To ensure test compatibility between NumPy v1 and v2 environments, we've adjusted the comparison tolerance to 1e-4. This accommodates slight variations (around 1e-4) in floating-point outcomes between the two NumPy versions. Additionally, we've modified the expected proto float to align with NumPy v2 results.
2. For mutual_information, NumPy v2 is able to handle values > 2**53 if the min and max of the examples are the same. However, since we need to be compatible with NumPy v1 and v2, for related unit tests, we check for the NumPy version before running the associated unit tests.

PiperOrigin-RevId: 681598675

* use xfail instead of skip

* remove xfails that are passing

* dont run xfail + add test deps

* fix build failure by pinning tensorflow_metadata

* move test requirements

* debugging

* more debugging

* remove upload for testing

* add environment variable to build nightly

* add extra-index-url

* trying to use nightly install

* revert debugging changes

* update upload artifact version

* revert metadata branch back to master

* fix typo

* remove install when built, move to only install on test

* change name of step checking the wheel after moving install to test workflow

* update PR number

* just remove PR

---------

Co-authored-by: Amit Kumar <dtu.amit@gmail.com>
Co-authored-by: tf-data-validation-team <tensorflow-extended-nonhuman@googlegroups.com>
2025-06-12 11:34:56 -07:00
tf-data-validation-team b4622cd568 TFDV 1.17.0 Release
PiperOrigin-RevId: 769185980
2025-06-09 09:54:55 -07:00
tf-data-validation-team fc03f17261 Add `Anomaly types` to the dataframe generated by `get_anomalies_dataframe`
PiperOrigin-RevId: 767192378
2025-06-04 10:12:49 -07:00
jinhuang 946dfc7e89 TFDV update for OSS build.
PiperOrigin-RevId: 762730883
2025-05-24 00:00:15 -07:00
tf-data-validation-team a6b4d3fd61 Fixes typo in docstring.
PiperOrigin-RevId: 757876640
2025-05-12 12:53:59 -07:00
tf-data-validation-team 73c2209ecf Fixes typo in docstring.
PiperOrigin-RevId: 753744384
2025-05-01 15:08:11 -07:00
tf-data-validation-team 247f161827 Update the TF version ceiling in Kokoro integration tests.
PiperOrigin-RevId: 736290502
2025-03-12 15:37:44 -07:00
tf-data-validation-team d37174bc22 no-op
PiperOrigin-RevId: 735887860
2025-03-11 14:20:14 -07:00
tf-data-validation-team 655c54bea2 Update type hints to account for Beam handling of Sequences
Improvements to Beam's type hinting infrastructure found a breakage in this code based on mismatched type hints (stemming from the one-way relationships between lists, sequences, and iterables.) CoGroupByKey outputs Iterables, not Sequences, but these type checks were errantly passing before.

PiperOrigin-RevId: 733394393
2025-03-04 11:06:49 -08:00
tf-data-validation-team 7d9f283c76 Fix the bug when calling run() under pipeline's context manager in Beam.
PiperOrigin-RevId: 729199597
2025-02-20 12:02:21 -08:00
tf-data-validation-team ab4eb7fcba Remove `srcs_version` and `python_version` attributes, as they already default to `"PY3"`
PiperOrigin-RevId: 721269125
2025-01-29 23:46:47 -08:00
tf-data-validation-team 530950cb44 Automated Code Change
PiperOrigin-RevId: 716962417
2025-01-18 03:20:00 -08:00
tf-data-validation-team 1861ea9bcc cleanup of deprecated test methods
PiperOrigin-RevId: 715994558
2025-01-15 16:35:36 -08:00
tf-data-validation-team 3876e25961 Automated Code Change
PiperOrigin-RevId: 714857857
2025-01-13 00:40:02 -08:00
tf-data-validation-team 71bb8a6b9f Remove TFDV nightly build dependencies to fix FI&TB nightly.
PiperOrigin-RevId: 714091498
2025-01-10 10:12:50 -08:00
tf-data-validation-team cb7b995065 Added suppressions for pytype --none-is-not-bool.
PiperOrigin-RevId: 698353624
2024-11-20 05:14:36 -08:00
tf-data-validation-team 036a88002e The result of JSD calculation can be slightly greater than 1.0 due to floating point error. This CL fixes the bug by capping the result at 1.0.
PiperOrigin-RevId: 692261708
2024-11-01 12:41:12 -07:00
tf-data-validation-team 58af9e79ad Prepare code for breaking change in Protobuf C++ API.
Protobuf 6.30.0 will change the return types of Descriptor::name() and other methods to absl::string_view. This makes the code work both before and after such a change.

PiperOrigin-RevId: 689135228
2024-10-23 16:00:59 -07:00
Amit Kumar d3710e6275
Add testing workflow (#260)
* add testing workflow

* single python

* trigger

* install in build job

* install pytest

* install test dependencies

* add xfail to tests

* add reusable workflows and add pr number in xfail

* fix composite action

* add more xfails

* xfail top_k_uniques_stats_generator_test.py

* xfails in partitioned_stats_generator_test.py

* more xfails

* add missing imports

* fix extra decorators

* more xfails

* use xfail instead of skip

* remove xfails that are passing

* dont run xfail + add test deps
2024-10-21 08:59:28 -07:00
tf-data-validation-team 573c0e4cb9 TFDV 1.16.1 Release
PiperOrigin-RevId: 686197633
2024-10-15 12:25:21 -07:00
tf-data-validation-team 0a670b8897 Mark tfdv compatible wih Protobuf v26+.
PiperOrigin-RevId: 686150589
2024-10-15 10:20:33 -07:00
tf-data-validation-team 4a18e91e34 TFDV 1.16.0 Release
PiperOrigin-RevId: 681656051
2024-10-02 17:42:09 -07:00
tf-data-validation-team 052aec82e1 Fix TAP and Kokoro tests caused by NumPy v2 migration.
1. To ensure test compatibility between NumPy v1 and v2 environments, we've adjusted the comparison tolerance to 1e-4. This accommodates slight variations (around 1e-4) in floating-point outcomes between the two NumPy versions. Additionally, we've modified the expected proto float to align with NumPy v2 results.
2. For mutual_information, NumPy v2 is able to handle values > 2**53 if the min and max of the examples are the same. However, since we need to be compatible with NumPy v1 and v2, for related unit tests, we check for the NumPy version before running the associated unit tests.

PiperOrigin-RevId: 681598675
2024-10-02 14:46:29 -07:00
Amit Kumar bca0f85e33
Add build workflow via docker (#259)
* Add build workflow via docker

* rename docker-compose to docker compose

* add twine check and upload to PyPi

* add workflow_dispatch

* install twine before twine check
2024-09-30 23:45:31 -07:00
tf-data-validation-team 3a64d240af Fix a bug in that caused custom validations to always report anomalies
PiperOrigin-RevId: 671804127
2024-09-06 10:02:23 -07:00
tf-data-validation-team 358631a5e1 Remove cc_api_version stage 4: deletion where cc_api_version = 2
PiperOrigin-RevId: 671095853
2024-09-04 14:14:37 -07:00
tf-data-validation-team 1c4cfff673 Fix users of NumPy APIs that are removed in NumPy 2.0.
This change migrates users of APIs removed in NumPy 2.0 to their recommended replacements (https://numpy.org/devdocs/numpy_2_0_migration_guide.html).

PiperOrigin-RevId: 655943417
2024-07-25 07:12:02 -07:00
zwestrick 03987b2bdc Fixes test flakiness caused by floating point nondeterminism by allowing some relative tolerance in feature proto comparison.
PiperOrigin-RevId: 651893670
2024-07-12 14:54:09 -07:00
tf-data-validation-team 7f2e655db2 internal inly
PiperOrigin-RevId: 651442932
2024-07-11 09:49:53 -07:00
caveness 5eb246c626 no-op
PiperOrigin-RevId: 639688041
2024-06-03 01:17:53 -07:00
tf-data-validation-team d235b2b3db Allow dev and github main versions to depend on corresponding Tensorflow
dev+main versions.

PiperOrigin-RevId: 631851771
2024-05-08 10:43:54 -07:00
tf-data-validation-team 9cfefc250c For nested features with N nested levels (N > 1), the statistics counting the number of values in `CommonStatistics` and `WeightedCommonStatistics` will rely on the innermost level.
PiperOrigin-RevId: 631265288
2024-05-06 19:57:44 -07:00
tf-data-validation-team a7059ac2cc Add a function to get the number of values in the innermost level of each array in the outmost level.
PiperOrigin-RevId: 630526074
2024-05-03 16:21:23 -07:00
tf-data-validation-team 0ba56da240 Updating TF version to use >=2.15,<2.16
PiperOrigin-RevId: 627890988
2024-04-24 16:48:44 -07:00
tf-data-validation-team 834a6ec1c8 TFDV 1.15.0 Release
PiperOrigin-RevId: 627607016
2024-04-23 22:14:14 -07:00
tf-data-validation-team eafa03d476 Internal change
PiperOrigin-RevId: 626057117
2024-04-18 09:21:45 -07:00
tf-data-validation-team 71b4573b62 Support counter stats for custom empty values. These don't count top level nulls.
PiperOrigin-RevId: 625828088
2024-04-17 15:39:12 -07:00
zwestrick c341ccfeda Fixes typehint for CombinerFeatureStatsGeneratorTest.assertCombinerOutputEqual
PiperOrigin-RevId: 622211080
2024-04-05 10:09:09 -07:00
tf-data-validation-team 0034be3737 Updating Tensorflow dependency to `~=2.15`
PiperOrigin-RevId: 615495595
2024-03-13 11:52:54 -07:00
tf-data-validation-team b4d79a597f Switch to PEP 508 style conditional dependencies.
PiperOrigin-RevId: 612943796
2024-03-05 13:08:32 -08:00
caveness 8a130fd949 Add standard histogram for level n value list length statistics
PiperOrigin-RevId: 611617202
2024-02-29 15:36:16 -08:00
tf-data-validation-team b4b9af69cb Depend on protobuf 3.20.3 for Python 3.9 and 3.10 and on 4.25.2 for Python 3.11.
PiperOrigin-RevId: 611585502
2024-02-29 13:54:09 -08:00
tf-data-validation-team da0f6bbc71 Internal change
PiperOrigin-RevId: 608623949
2024-02-20 09:00:01 -08:00
caveness 2875bb9b01 Update parts of OSS proto build to align with tfx-bsl.
PiperOrigin-RevId: 608016553
2024-02-17 15:28:19 -08:00
caveness b734f44800 Remove remaining parts of TFDV code that supported running on Windows.
PiperOrigin-RevId: 607811820
2024-02-16 15:04:57 -08:00
caveness 2e3e0979b5 no-op
PiperOrigin-RevId: 607746752
2024-02-16 11:20:15 -08:00
zwestrick 3ce33a2946 Adds a new experimental FeaturesConfig type to support disabling quantiles sketches on a feature by feature basis, either with an allowlist or excludelist.
PiperOrigin-RevId: 604782499
2024-02-06 15:27:32 -08:00