Library for exploring and validating machine learning data
APACHE-2.0 License
Bot releases are hidden (Show)
Published by jay90099 over 3 years ago
values_counts
for nestedabsl-py>=0.9,<0.13
.tensorflow-metadata>=0.29,<0.30
.tfx-bsl>=0.29,<0.30
.Published by jay90099 over 3 years ago
numpy>=1.16,<1.20
.bytes
type in get_feature_value_slicer
in addition to Text
int
.tfdv.infer_schema
and tfdv.update_schema
were called withinfer_feature_shape=True
.infer_feature_shape
of function tfdv.update_schema
.tfdv.update_schema
willtfdv.StatsOptions.feature_whitelist
and addedfeature_allowlist
as a replacement. The former will be removed in the nextget_schema_dataframe
and get_anomalies_dataframe
utilityapache-beam[gcp]>=2.28,<3
.tensorflow-metadata>=0.28,<0.29
.tfx-bsl>=0.28.1,<0.29
.Published by dhruvesh09 over 3 years ago
BasicStatsGenerator
.compact()
and setup()
interface to CombinerStatsGenerator
,CombinerFeatureStatsWrapperGenerator
, BasicStatsGenerator
,CompositeStatsGenerator
, and ConstituentStatsGenerator
.tensorflow-transform
.apache-beam[gcp]>=2.27,<3
.pyarrow>=1,<3
.tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<3
.tensorflow-metadata>=0.27,<0.28
.tfx-bsl>=0.27,<0.28
.tfdv.DecodeCSV
and tfdv.DecodeTFExample
are deprecated. Usetfx_bsl.public.tfxio.CsvTFXIO
and tfx_bsl.public.tfxio.TFExampleRecord
Published by jay90099 almost 4 years ago
per_feature_weight_override
StatsOptions.__init__
.tfdv.GenerateStatistics()
.apache-beam[gcp]>=2.25,!=2.26.*,<3
.tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.4.*,<3
.tensorflow-metadata>=0.26,<0.27
.tensorflow-transform>=0.26,<0.27
.tfx-bsl>=0.26,<0.27
.Published by jay90099 almost 4 years ago
Add support for detecting drift and distribution skew in numeric features.
tfdv.validate_statistics
now also reports the raw measurements of
distribution skew/drift (if any is done), regardless whether skew/drift is
detected. The report is in the drift_skew_info
of the Anomalies
proto
(return value of validate_statistics
).
From this release TFDV will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:
pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-data-validation
Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFDV available on PyPI by running the
command pip install tensorflow-data-validation
.
tfdv.load_stats_binary
to load stats what were written usingtfdv.WriteStatisticsToText
(now tfdv.WriteStatisticsToBinaryFile
).import tensorflow_data_validation
would fail if IPythonapache-beam[gcp]>=2.25,<3
.tensorflow-metadata>=0.25,<0.26
.tensorflow-transform>=0.25,<0.26
.tfx-bsl>=0.25,<0.26
.tfdv.WriteStatisticsToText
is renamed astfdv.WriteStatisticsToBinaryFile
. The former is still available but willPublished by dhruvesh09 about 4 years ago
apache-beam[gcp]>=2.24,<3
.tensorflow-transform>=0.24.1,<0.25
.tfx-bsl>=0.24.1,<0.25
.Published by dhruvesh09 about 4 years ago
apache-beam[gcp]>=2.24,<3
.Published by dhruvesh09 about 4 years ago
python setup.py bdist_wheel
. Note:tfdv.visualize_statistics
absl-py>=0.9,<0.11
.pandas>=1.0,<2
.protobuf>=3.9.2,<4
.tensorflow-metadata>=0.24,<0.25
.tensorflow-transform>=0.24,<0.25
.tfx-bsl>=0.24,<0.25
.sample_count
option in tfdv.StatsOptions
. Use sample_rate
Published by dhruvesh09 about 4 years ago
apache-beam[gcp]>=2.23,<3
.pyarrow>=0.17,<0.18
.tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3
.tensorflow-metadata>=0.23,<0.24
.tensorflow-transform>=0.23,<0.24
.tfx-bsl>=0.23,<0.24
.Published by dhruvesh09 over 4 years ago
Published by dhruvesh09 over 4 years ago
valency_and_presence_stats
in CommonStatistics
.pandas>=0.24,<2
.tensorflow-metadata>=0.22.2,<0.23.0
.tfx-bsl>=0.22.1,<0.23.0
.Published by dhruvesh09 over 4 years ago
tfdv.get_slice_stats
to get statistics for a slice andtfdv.compare_slices
to compare statistics of two slices using Facets.tfdv.load_stats_text
and tfdv.write_stats_text
public.tfdv.WriteStatisticsToText
andtfdv.WriteStatisticsToTFRecord
to write statistics proto to text andtfdv.load_statistics
to handle reading statistics from TFRecord andmutual-information
. As a result, barebonescikit-learn
any more.visualization
. As a result, barebone TFDVipython
any more.all
that specifies all the extrapip install tensorflow-data-validation[all]
pyarrow>=0.16,<0.17
.apache-beam[gcp]>=2.20,<3
.tensorflow>=1.15,!=2.0.*,<3
.tensorflow-metadata>=0.22.0,<0.23
.tensorflow-transform>=0.22,<0.23
.tfx-bsl>=0.22,<0.23
.tfdv.GenerateStatistics
now accepts a PCollection of pa.RecordBatch
pa.Table
.pa.RecordBatch
instead ofpa.Table
.tfdv.validate_instances
andtfdv.api.validation_api.IdentifyAnomalousExamples
now takespa.RecordBatch
as input instead of pa.Table
.StatsGenerator
interface (and all its sub-classes) now takespa.RecordBatch
as the input data instead of pa.Table
.pa.RecordBatch
instead ofpa.Table
as input and should output a tuple (slice_key, record_batch)
.Published by dhruvesh09 over 4 years ago
label_feature
to StatsOptions
and enable LiftStatsGenerator
whenlabel_feature
and schema
are provided.avro-python3>=1.8.1,!=1.9.2.*,<2.0.0
on Python 3.5 + MacOSPublished by dhruvesh09 over 4 years ago
Published by dhruvesh09 over 4 years ago
tfdv.TFExampleDecoder
has been removed. This legacy decoder convertstf.Example
to a dict of numpy arrays, which is the legacytfdv.DecodeTFExample
instead.Published by dhruvesh09 over 4 years ago
Published by dhruvesh09 over 4 years ago
tfx-bsl
(since tfx-bsl 0.15.2). This also brings performance improvementstensorflow-metadata>=0.21.0,<0.22
.pyarrow>=0.15
(removed the upper bound as it is determined bytfx-bsl
).tfx-bsl>=0.21.0,<0.22
apache-beam>=2.17,<3
Changed the behavior regarding to statistics over CSV data:
Removed csv_decoder.DecodeCSVToDict
as Dict[str, np.ndarray]
had no longer
been the internal data representation any more since 0.14.
Published by paulgc almost 5 years ago
weighted_num_examples
field in the statistics proto if a weightapache-beam[gcp]>=2.16,<3
.six>=1.12,<2
.scikit-learn>=0.18,<0.22
.tfx-bsl>=0.15,<0.16
.tensorflow-metadata>=0.15,<0.16
.tensorflow-transform>=0.15,<0.16
.tensorflow>=1.15,<3
.
tensorflow
comes with GPU support. Users won't need to choose betweentensorflow
and tensorflow-gpu
.tensorflow
2.0.0 is an exception and does not have GPUtensorflow-gpu
2.0.0 is installed before installingtensorflow-data-validation
, it will be replaced with tensorflow
2.0.0.tensorflow-gpu
2.0.0 if needed.Published by paulgc about 5 years ago
Published by paulgc about 5 years ago
validate_examples_in_tfrecord
, which identifies anomalousvalidate_examples_in_csv
, which identifies anomalousBasicStatsGenerator
to take arrow table as input. Example batches areTopKUniquesStatsGenerator
and TopKUniquesCombinerStatsGenerator
toupdate_schema
API which updates the schema to conform to statistics.validate_statistics
match_ratio
.__slots__
in accumulators.load_anomalies_text
and write_anomalies_text
utility functions.semantic_domain_stats_sample_rate
option to compute semantic domaincompression_type
option to generate_statistics_from_*
methods.GenerateStatistics
generate a DatasetFeatureStatisticsList containing aabsl-py>=0.7,<1
apache-beam[gcp]>=2.14,<3
numpy>=1.16,<2
.pandas>=0.24,<1
.pyarrow>=0.14.0,<0.15.0
.scikit-learn>=0.18,<0.21
.tensorflow-metadata>=0.14,<0.15
.tensorflow-transform>=0.14,<0.15
.Change examples_threshold
to values_threshold
and update documentation to
clarify that counts are of values in semantic domain stats generators.
Refactor IdentifyAnomalousExamples to remove sampling and output
(anomaly reason, example) tuples.
Rename anomaly_proto
parameter in anomalies utilities to anomalies
to
make it more consistent with proto and schema utilities.
FeatureNameStatistics
produced by GenerateStatistics
is now identified
by its .path
field instead of the .name
field. For example:
feature {
name: "my_feature"
}
becomes:
feature {
path {
step: "my_feature"
}
}
Change validate_instance
API to accept an Arrow table instead of a Dict.
Change GenerateStatistics
API to accept Arrow tables as input.