nannyml

nannyml: post-deployment data science in python

APACHE-2.0 License

Downloads
10.7K
Stars
1.8K
Committers
33

Bot releases are visible (Hide)

nannyml - v0.10.6 Latest Release

Published by github-actions[bot] 5 months ago

Changed

  • Make predictions optional for performance calcuation. When not provided, only AUROC and average precision will be calculated. (#380)
  • Small DLE docs updates
  • Combed through and optimized the reconstruction error calculation with PCA resulting in a nice speedup. Cheers @nikml! (#385)
  • Updated summary stats value limits to be in line with the rest of the library. Changed from np.nan to None. (#387)

Fixed

  • Fixed a breaking issue in the sampling error calculation for the median summary statistic when there is only a single value for a column. (#377)
  • Drop identifier column from the documentation example for reconstruction error calculation with PCA. (#382)
  • Fix an issue where default threshold configurations would get changed when upon setting custom thresholds, bad mutables! (#386)
nannyml - v0.10.5

Published by github-actions[bot] 7 months ago

Changed

  • Updated dependencies for Python 3.8 and up. (#375)

Added

  • Support for the average precision metric for binary classification in realized and estimated performance. (#374)
nannyml - v0.10.4

Published by github-actions[bot] 8 months ago

Changed

  • We've changed the defaults for the incomplete parameter in the SizeBasedChunker and CountBasedChunker
    to keep from the previous append. This means that from now on, by default, you might have an additional
    "incomplete" final chunk. Previously these records would have been appended to the last "complete" chunk.
    This change was required for some internal developments, and we also felt it made more sense when looking at
    continuous monitoring (as the incomplete chunk will be filled up later as more data is appended). (#367)
  • We've renamed the Classifier for Drift Detection (CDD) to the more appropriate Domain Classifier. (#368)
  • Bumped the version of the pyarrow dependency to ^14.0.0 if you're running on Python 3.8 or up.
    Congrats on your first contribution here @amrit110, much appreciated!

Fixed

  • Continuous distribution plots will now be scaled per chunk, as opposed to globally. (#369)
nannyml - v0.10.3

Published by github-actions[bot] 8 months ago

Fixed

  • Handle median summary stat calculation failing due to NaN values
  • Fix standard deviation summary stat sampling error calculation occasionally returning infinity (#363)
  • Fix plotting confidence bands when value gaps occur (#364)

Added

  • New multivariate drift detection method using a classifier and density ration estimation.
nannyml - v0.10.2

Published by github-actions[bot] 8 months ago

Changed

  • Removed p-value based thresholds for Chi2 univariate drift detection (#349)
  • Change default thresholds for univariate drift methods to standard deviation based thresholds.
  • Add summary stats support to the Runner and CLI (#353)
  • Add unique identifier columns to included datasets for better joining (#348)
  • Remove unused confidence_deviation properties in CBPE metrics (#357)
  • Improved error handling: failing metric calculation for a single chunk will no longer stop an entire calculator.

Added

  • Add feature distribution calculators (#352)

Fixed

  • Fix join column settings for CLI (#356)
  • Fix crashes in UnseenValuesCalculator
nannyml - v0.10.1

Published by github-actions[bot] 11 months ago

  • Various small fixes to the docs, thanks once again ghostwriter @NeoKish! (#345)
  • Fixed an issue with estimated accuracy for multiclass classification in CBPE. (#346)
nannyml - v0.10.0

Published by github-actions[bot] 11 months ago

Changed

  • Telemetry now detects AKS and EKS and NannyML Cloud runtimes. (#325)
  • Runner was refactored, so it can be extended with premium NannyML calculators and estimators. (#325)
  • Sped up telemetry reporting to ensure it doesn't hinder performance.
  • Some love for the docs as @santiviquez tediously standardized variable names. (#338)
  • Optimize calculations for L-infinity method. [(#340)]
  • Refactored the CalibratorFactory to align with our other factory implementations. [(#341)]
  • Updated the Calibrator interface with *args and **kwargs for easier extension.
  • Small refactor to the ResultComparisonMixin to allow easier extension.

Added

  • Added support for directly estimating the confusion matrix of multiclass classification models using CBPE.
    Big thanks to our appreciated alumnus @cartgr for the effort (and sorry it took soooo long). (#287)
  • Added DatabaseWriter support for results from MissingValuesCaclulator and UnseenValuesCalculator. Some
    excellent work by @bgalvao, thanks for being a long-time user and supporter!

Fixed

  • Fix issues with calculation and filtering in performance calculation and estimation. (#321)
  • Fix multivariate reconstruction error plot labels. (#323)
  • Log a warning when performance metrics for a chunk will return NaN value. (#326)
  • Fix issues with ReadTheDocs build failing
  • Fix erroneous specificity calculation, both realized and estimated. Well spotted @nikml! (#334)
  • Fix threshold computation when dealing with NaN values. Major thanks to the eagle-eyed @giodavoli. (#333)
  • Fix exports for confusion matrix metrics using the DatabaseWriter. An inspiring commit that lead to some other changes.
    Great job @shezadkhan137! (#335)
  • Fix incorrect normalization for the business value metric in realized and estimated performance. (#337)
  • Fix handling NaN values when fitting univariate drift. [(#340)]
nannyml - v0.9.1

Published by github-actions[bot] over 1 year ago

Changed

  • Updated Mendable client library version to deal with styling overrides in the RTD documentation theme
  • Removed superfluous limits for confidence bands in the CBPE class (these are present in the metric classes instead)
  • Threshold value limiting behaviour (e.g. overriding a value and emitting a warning) will be triggered not only when
    the value crosses the threshold but also when it is equal to the threshold value. This is because we interpret the
    threshold as a theoretical maximum.

Added

  • Added a new example notebook walking through a full use case using the NYC Green Taxi dataset, based on the blog of @santiviquez

Fixed

  • Fixed broken Docker container build due to changes in public Poetry installation procedure
  • Fixed broken image source link in the README, thanks @NeoKish!
nannyml - v0.9.0

Published by github-actions[bot] over 1 year ago

Changed

  • Updated API docs for the nannyml.io package, thanks @maciejbalawejder (#286)
  • Restricted versions of numpy to be <1.25, since there seems to be a change in the roc_auc calculation somehow (#301)

Added

  • Support for Data Quality calculators in the CLI runner
  • Support for Data Quality results in Ranker implementations (#297)
  • Support mendable in the docs (#295)
  • Documentation landing page (#303)
  • Support for calculations with delayed targets (#306)

Fixed

  • Small changes to quickstart, thanks @NeoKish (#291)
  • Fix an issue passing *args and **kwargs in Result.filter() and subclasses (#298)
  • Double listing of the binary dataset documentation page
  • Add missing thresholds to roc_auc in CBPE (#294)
  • Fix plotting issue due to introduction of additional values in the 'display names tuple' (#305)
  • Fix broken exception handling due to inheriting from BaseException and not Exception (#307)
nannyml - v0.8.6

Published by github-actions[bot] over 1 year ago

Changed

Added

  • Added new calculators to support simple data quality metrics such as counting missing or unseen values.
    For more information, check out the data quality tutorials.

Fixed

  • Fixed an issue where x-axis titles would appear on top of plots
  • Removed erroneous checks during calculation of realized regression performance metrics. (#279)
  • Fixed an issue dealing with az:// URLs in the CLI, thanks @michael-nml (#283)
nannyml - v0.8.5

Published by github-actions[bot] over 1 year ago

Changed

  • Applied new rules for visualizations. Estimated values will be the color indigo and represented with a dashed line.
    Calculated values will be blue and have a solid line. This color coding might be overridden in comparison plots.
    Data periods will no longer have different colors, we've added some additional text fields to the plot to indicate the data period.
  • Cleaned up legends in plots, since there will no longer be a different entry for reference and analysis periods of metrics.
  • Removed the lower threshold for default thresholds of the KS and Wasserstein drift detection methods.

Added

  • We've added the business_value metric for both estimated and realized binary classification performance. It allows
    you to assign a value (or cost) to true positive, true negative, false positive and false negative occurrences.
    This can help you track something like a monetary value or business impact of a model as a metric. Read more in the
    business value tutorials (estimated
    or realized)
    or the how it works page.

Fixed

  • Sync quickstart of the README with the dedicated quickstart page. (#256)
    Thanks @NeoKish!
  • Fixed incorrect code snippet order in the thresholding tutorial. (#258)
    Thanks once more to the one and only @NeoKish!
  • Fixed broken container build that had sneakily been going on for a while
  • Fixed incorrect confidence band color in comparison plots (#259)
  • Fixed incorrect titles and missing legends in comparison plots (#264)
  • Fixed an issue where numerical series marked as category would cause issues during Chi2 calculation
nannyml - v0.8.4

Published by github-actions[bot] over 1 year ago

Changed

  • Updated univariate drift methods to no longer store all reference data by default (#182)
  • Updated univariate drift methods to deal better with missing data (#202)
  • Updated the included example datasets
  • Critical security updates for dependencies
  • Updated visualization of multi-level table headers in the docs (#242)
  • Improved typing support for Result classes using generics

Added

  • Support for estimating the confusion matrix for binary classification (#191)
  • Added treat_as_categorical parameter to univariate drift calculator (#239)
  • Added comparison plots to help visualize two different metrics at once

Fixed

  • Fix missing confidence boundaries in some plots (#193)
  • Fix incorrect metric names on plot y-axes (#195)
  • Fix broken links to external docs (#196)
  • Fix missing display name to performance calculation and estimation charts (#200)
  • Fix missing confidence boundaries for single metric plots (#203)
  • Fix incorrect code in example notebook for ranking
  • Fix result corruption when re-using calculators (#206)
  • Fix unintentional period filtering (#199)
  • Fixed some typing issues (#213)
  • Fixed missing data requirements documentation on regression (#215)
  • Corrections in the glossary (#214), thanks @sebasmos!
  • Fix missing treshold in plotting legend (#219)
  • Fix missing annotation in single row & column charts (#221)
  • Fix outdated performance estimation and calculation docs (#223)
  • Fix categorical encoding of unseen values for DLE (#224)
  • Fix incorrect legend for None timeseries (#235)
nannyml - v0.8.3

Published by github-actions[bot] over 1 year ago

Added

  • Added some extra semantic methods on results for easy property access. No dealing with multilevel indexes required.
  • Added functionality to compare results and plot that comparison. Early release version.

Fixed

  • Pinned Sphinx version to 4.5.0 in the documentation requirements.
    Version selector, copy toggle buttons and some styling were broken on RTD due to unintended usage of Sphinx 6 which
    treats jQuery in a different way.
nannyml - v0.8.2

Published by github-actions[bot] over 1 year ago

Changed

  • Log Ranker usage logging
  • Remove some redundant parameters in plot() function calls for data reconstruction results, univariate drift results,
    CBPE results and DLE results.
  • Support "single metric/column" arguments in addition to lists in class creation (#165)
  • Fix incorrect 'None' checks when dealing with defaults in univariate drift calculator
  • Multiple updates and corrections to the docs (thanks @nikml!), including:
    • Updating univariate drift tutorial
    • Updating README
    • Update PCA: How it works
    • Fix incorrect plots
    • Fix quickstart (#171)
  • Update chunker docstrings to match parameter names, thanks @mrggementiza!
  • Make sequence 'None' checks more readable, thanks @mrggementiza!
  • Ensure error handling in usage logging does not cause errors...
  • Start using OrdinalEncoder instead of LabelEncorder in DLE. This allows us to deal with "unseen" values in the
    analysis period.

Added

  • Added a Store to provide persistence for objects. Main use case for now is storing fitted calculators to be reused
    later without needing to fit on reference again. Current store implementation uses a local or remote filesystem as a
    persistence layer. Check out the documentation on persisting calculators.

Fixed

  • Fix incorrect interpretation of y_pred column as continuous values for the included sample binary classification data.
    Converting the column explicitly to "category" data type for now, update of the dataset to follow soon.
    (#171)
  • Fix broken image link in README, thanks @mrggementiza!
  • Fix missing key in the CLI section on raw files output, thanks @CoffiDev!
  • Fix upper and lower thresholds for data reconstruction being swapped (#179)
  • Fix stacked bar chart plots (missing bars + too many categories shown)
nannyml - v0.8.1

Published by github-actions[bot] almost 2 years ago

Changed

  • Thorough refactor of the nannyml.drift.ranker module. The abstract base class and factory have been dropped in favor
    of a more flexible approach.
  • Thorough refactor of our Plotly-based plotting modules. These have been rewritten from scratch to make them more
    modular and composable. This will allow us to deliver more powerful and meaningful visualizations faster.

Added

  • Added a new univariate drift method. The Hellinger distance, used for continuous variables.
  • Added an extensive write-up on when to use which univariate drift method.
  • Added a new way to rank the results of univariate drift calculation. The CorrelationRanker ranks columns based on
    the correlation between the drift value and the change in realized or estimated performance. Read all about it in the
    ranking documentation

Fixed

  • Disabled usage logging for or GitHub workflows
  • Allow passing a single string to the metrics parameter of the result.filter() function, as per special request.
nannyml - v0.8.0

Published by github-actions[bot] almost 2 years ago

Changed

  • Updated mypy to a new version, immediately resulting in some new checks that failed.

Added

  • Added new univariate drift methods. The Wasserstein distance for continuous variables,
    and the L-Infinity distance for categorical variables.
  • Added usage logging to our key functions. Check out the docs to find out more on what, why, how, and how to
    disable it if you want to.

Fixed

  • Fixed and updated various parts of the docs, reported at warp speed! Thanks @NeoKish!
  • Fixed mypy issues concerning 'implicit optionals'.
nannyml - v0.7.0

Published by github-actions[bot] almost 2 years ago

Changed

  • Updated the handling of "leftover" observations when using the SizeBasedChunker and CountBasedChunker.
    Renamed the parameter for tweaking that behavior to incomplete, that can be set to keep, drop or append.
    Default behavior for both is now to append leftover observations to the last full chunk.
  • Refactored the nannyml.drift module. The intermediate structural level (model_inputs, model_outputs, targets)
    has been removed and turned into a single unified UnivariateDriftCalculator. The old built-in statistics have been
    re-implemented as Methods, allowing us to add new methods to detect univariate drift.
  • Simplified a lot of the codebase (but also complicated some bits) by storing results internally as multilevel-indexed
    DataFrames. This means we no longer have to 'convey information' by encoding data column names and method names in
    the names of result columns. We've introduced a new paradigm to deal with results. Drill down to the data you really
    need by using the filter method, which returns a new Result instance, with a smaller 'scope'. Then turn this
    Result into a DataFrame using the to_df method.
  • Changed the structure of the pyproject.toml file due to a Poetry upgrade to version 1.2.1.

Added

  • Expanded the nannyml.io module with new Writer implementations: DatabaseWriter that exports data into multiple
    tables in a relational database and the PickleFileWriter which stores the
    pickled Results on local/remote/cloud disk.
  • Added a new univariate drift detection method based on the Jensen-Shannon distance.
    Used within the UnivariateDriftCalculator.

Fixed

  • Added lightgbm installation instructions to our installation guide.
nannyml - v0.6.3

Published by github-actions[bot] about 2 years ago

Changed

  • dependencybot dependency updates
  • stalebot setup

Fixed

  • CBPE now uses uncalibrated y_pred_proba values to calculate realized performance. Fixed for both binary and
    multiclass use cases (#98)
  • Fix an issue where reference data was rendered incorrectly on joy plots
  • Updated the 'California Housing' example docs, thanks for the help @NeoKish
  • Fix lower confidence bounds and thresholds under zero for regression cases. When the lower limit is set to 0,
    the lower threshold will not be plotted. (#127)
nannyml - v0.6.2

Published by github-actions[bot] about 2 years ago

Changed

  • Made the timestamp_column_name required by all calculators and estimators optional. The main consequences of this
    are plots have a chunk-index based x-axis now when no timestamp column name was given. You can also not chunk by
    period when the timestamp column name is not specified.

Fixed

  • Added missing s3fs dependency
  • Fixed outdated plotting kind constants in the runner (used by CLI)
  • Fixed some missing images and incorrect version numbers in the README, thanks @NeoKish!

Added

  • Added a lot of additional tests, mainly concerning plotting and the Runner class
nannyml - v0.6.1

Published by github-actions[bot] about 2 years ago

Changed

  • Use the problem_type parameter to determine the correct graph to output when plotting model output drift

Fixed

  • Showing the wrong plot title for DLE estimation result plots, thanks @NeoKish
  • Fixed incorrect plot kinds in some error feedback for the model output drift calculator
  • Fixed missing problem_type argument in the Quickstart guide
  • Fix incorrect visualization of confidence bands on reference data in DLE and CBPE result plots
Package Rankings
Top 6.64% on Proxy.golang.org
Top 28.69% on Conda-forge.org
Top 5.63% on Pypi.org