NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

APACHE-2.0 License

Downloads
9.5K
Stars
1K
Committers
44

Bot releases are hidden (Show)

NVTabular - v1.3.1

Published by benfred about 2 years ago

What’s Changed

🔧 Maintenance

  • Tri up time @jperez999 (#1623)
NVTabular - v1.1.0

Published by benfred over 2 years ago

Known Issues

What's Changed

Full Changelog: https://github.com/NVIDIA-Merlin/NVTabular/compare/v1.0.0...v1.1.0

NVTabular - v1.0.0

Published by benfred over 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/NVIDIA-Merlin/NVTabular/compare/v0.11.0...v1.0.0

NVTabular - v0.11.0

Published by karlhigley over 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/NVIDIA-Merlin/NVTabular/compare/v0.10.0...v0.11.0

NVTabular - v0.10.0

Published by benfred over 2 years ago

What's Changed

Full Changelog: https://github.com/NVIDIA-Merlin/NVTabular/compare/v0.9.0...v0.10.0

NVTabular - v0.9.0

Published by benfred almost 3 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/NVIDIA-Merlin/NVTabular/compare/v0.8.0...v0.9.0

NVTabular - v0.8.0

Published by benfred almost 3 years ago

What's Changed

Full Changelog: https://github.com/NVIDIA-Merlin/NVTabular/compare/v0.7.1...v0.8.0

NVTabular - v0.7.1

Published by benfred almost 3 years ago

NVTabular v0.7.1 (2 November 2021)

Improvements

  • Add LogOp support for list features #1153
  • Add Normalize operator support for list features #1154
  • Add DataLoader.epochs() method and Dataset.to_iter(epochs=) argument #1147
  • Add ValueCount operator for recording of multihot min and max list lengths #1171

Bug Fixes

  • Fix Criteo inference #1198
  • Fix performance regressions in Criteo benchmark #1222
  • Fix error in JoinGroupby op #1167
  • Fix Filter/JoinExternal key error #1143
  • Fix LambdaOp transforming dependency values #1185
  • Fix reading parquet files with list columns from GCS #1155
  • Fix TargetEncoding with dependencies as the target #1165
  • Fix Categorify op to calculate unique count stats for Nulls #1159
NVTabular - v0.7.0

Published by benfred about 3 years ago

NVTabular v0.7.0

Improvements

  • Add column tagging API #943
  • Export dataset schema when writing out datasets #948
  • Make dataloaders aware of schema #947
  • Standardize a Workflows representation of its output columns #372
  • Add multi-gpu training example using PyTorch Distributed #775
  • Speed up reading Parquet files from remote storage like GCS or S3 #1119
  • Add utility to convert TFRecord datasets to Parquet #1085
  • Add multi-gpu training example using PyTorch Distributed #775
  • Add multihot support for PyTorch inference #719
  • Add options to reserve categorical indices in the Categorify() op #1074
  • Update notebooks to work with CPU only systems #960
  • Save output from Categorify op in a single table for HugeCTR #946
  • Add a keyset file for HugeCTR integration #1049

Bug Fixes

  • Fix category counts written out by the Categorify op #1128
  • Fix HugeCTR inference example #1130
  • Fix make_feature_column_workflow bug in Categorify if features have vocabularies of varying size. #1062
  • Fix TargetEncoding op on CPU only systems #976
  • Fix writing empty partitions to Parquet files #1097
NVTabular -

Published by benfred about 3 years ago

NVTabular v0.6.1

Bug Fixes

  • Fix installing package via pip #1030
  • Fix inference with groupby operator #1019
  • Install tqdm with conda package #1030
  • Fix workflow output_dtypes with empty partitions #1028
NVTabular -

Published by benfred about 3 years ago

NVTabular v0.6.0

Improvements

  • Add CPU support #534
  • Speed up inference on Triton Inference Server #744
  • Add support for session based recommenders #355
  • Add PyTorch Dataloader support for Sparse Tensors #500
  • Add ListSlice operator for truncating list columns #734
  • Categorical ids sorted by frequency #799
  • Add ability to select a subset of a ColumnGroup #809
  • Add option to use Rename op to give a single column a new fixed name #825
  • Add a 'map' function to KerasSequenceLoader, which enables sample weights #667
  • Add JoinExternal option on nvt.Dataset in addition to cudf #370
  • Allow passing ColumnGroup to get_embedding_sizes #732
  • Add ability to name LambdaOp and provide a better default name in graph visualizations #860

Bug Fixes

  • Fix make_feature_column_workflow for Categorical columns #763
  • Fix Categorify output dtypes for list columns #963
  • Fix inference for Outbrain example #669
  • Fix dask metadata after calling workflow.to_ddf() #852
  • Fix out of memory errors #896, #971
  • Fix normalize output when stdev is zero #993
  • Fix using UCX with a dask cluster on Merlin containers #872
NVTabular - v0.5.3

Published by benfred over 3 years ago

Bug Fixes

  • Fix Shuffling in Torch DataLoader #818
  • Fix "Unsupported type_id conversion" in triton inference for string columns #813
  • Fix HugeCTR inference backend Merlin#8
NVTabular - v0.5.2

Published by benfred over 3 years ago

Bug Fixes

  • Fix Movielens TF example running on 1080ti #792
  • Fix Multihot output from get_embedding_sizes #808
  • Fix accelerated training documentation #791
NVTabular - v0.5.1

Published by benfred over 3 years ago

Improvements

  • Update dependencies to use cudf 0.19
  • Removed conda from docker containers, leading to much smaller container sizes
  • Added CUDA 11.2 support
  • Added FastAI v2.3 support

Bug Fixes

  • Fix NVTabular preprocessing with HugeCTR inference
NVTabular - v0.5.0

Published by benfred over 3 years ago

Improvements

  • Adding Horovod integration to NVTabular's dataloaders, allowing you to use multiple GPU's to train TensorFlow and PyTorch models
  • Adding a Groupby operation for use with session based recommender models
  • Added ability to read and write datasets partitioned by a column
  • Add example notebooks for using Triton Inference Server with NVTabular
  • Restructure and simplify Criteo example notebooks
  • Add support for PyTorch inference with Triton Inference Server

Bug Fixes

  • Fix bug with preprocessing categorical columns with NVTabular not working with HugeCTR and Triton Inference Server #707
NVTabular - v0.4.0

Published by benfred over 3 years ago

Breaking Changes

  • The API for NVTabular has been signficantly refactored, and existing code targetting the 0.3 API will need to be updated.
    Workflows are now represented as graphs of operations, and applied using a sklearn 'transformers' style api. Read more by
    checking out the examples

Improvements

  • Triton integration support for NVTabular with TensorFlow and HugeCTR models
  • Recommended cloud configuration and support for AWS and GCP
  • Reorganized examples and documentation
  • Unified Docker containers for Merlin components (NVTabular, HugeCTR and Triton)
  • Dataset analysis and generation tools
NVTabular - v0.3.0

Published by benfred over 3 years ago

Package Rankings
Top 4.56% on Pypi.org
Badges
Extracted from project README
PyPI LICENSE Documentation
Related Projects