featuretools

An open source python library for automated feature engineering

BSD-3-CLAUSE License

Downloads
71.4K
Stars
7.1K
Committers
70

Bot releases are visible (Hide)

featuretools - v0.12.0

Published by rwedge almost 5 years ago

v0.12.0 Oct 31, 2019

  • Enhancements
    • Added First primitive (#770)
    • Added Entropy aggregation primitive (#779)
    • Allow custom naming for multi-output primitives (#780)
  • Fixes
    • Prevents user from removing base entity time index using additional_variables (#768)
    • Fixes error when a multioutput primitive was supplied to dfs as a groupby trans primitive (#786)
  • Changes
    • Drop Python 2 support (#759)
    • Add unit parameter to AvgTimeBetween (#771)
    • Require Pandas 0.24.1 or higher (#787)
  • Documentation Changes
    • Update featuretools slack link (#765)
    • Set up repo to use Read the Docs (#776)
    • Add First primitive to API reference docs (#782)
  • Testing Changes
    • CircleCI fixes (#774)
    • Disable PIP progress bars (#775)

Thanks to the following people for contributing to this release:
@ablacke-ayx, @BoopBoopBeepBoop, @jeffzi, @kmax12, @rwedge, @thehomebrewnerd, @twdobson

featuretools - v0.4.1

Published by rwedge about 5 years ago

v0.4.1 Nov 29, 2018

  • Resolve bug preventing using first column as index by default (#308)
  • Handle return type when creating features from Id variables (#318)
  • Make id an optional parameter of EntitySet constructor (#324)
  • Handle primitives with same function being applied to same column (#321)
  • Update requirements (#328)
  • Clean up DFS arguments (#319)
  • Clean up Pandas Backend (#302)
  • Update properties of cumulative transform primitives (#320)
  • Feature stability between versions documentation (#316)
  • Add download count to GitHub readme (#310)
  • Fixed #297 update tests to check error strings (#303)
  • Remove usage of fixtures in agg primitive tests (#325)
featuretools - v0.11.0

Published by rwedge about 5 years ago

v0.11.0 Sep 30, 2019

  • Enhancements
    • Improve how files are copied and written (#721)
    • Add number of rows to graph in entityset.plot (#727)
    • Added support for pandas DateOffsets in DFS and Timedelta (#732)
    • Enable feature-specific top_n value using a dictionary in encode_features (#735)
    • Added progress_callback parameter to dfs() and calculate_feature_matrix() (#739 #745)
    • Enable specifying primitives on a per column or per entity basis (#748)
  • Fixes
    • Fixed entity set deserialization (#720)
    • Added error message when DateTimeIndex is a variable but not set as the time_index (#723)
    • Fixed CumCount and other group-by transform primitives that take ID as input (#733, #754)
    • Fix progress bar undercounting (#743)
  • Updated training_window error assertion to only check against observations (#728)
    • Don't delete the whole destination folder while saving entityset (#717)
  • Changes
    • Raise warning and not error on schema version mismatch (#718)
    • Change feature calculation to return in order of instance ids provided (#676)
    • Removed time remaining from displayed progress bar in dfs() and calculate_feature_matrix() (#739)
    • Raise warning in normalize_entity() when time_index of base_entity has an invalid type (#749)
    • Remove toolz as a direct dependency (#755)
    • Allow boolean variable types to be used in the Multiply primitive (#756)
  • Documentation Changes
    • Updated URL for Compose (#716)
  • Testing Changes
    • Update dependencies (#738, #741, #747)

Thanks to the following people for contributing to this release: @angela97lin, @chidauri, @christopherbunn, @frances-h, @jeff-hernandez, @kmax12, @MarcoGorelli, @rwedge, @thehomebrewnerd

featuretools - v0.10.1

Published by rwedge about 5 years ago

v0.10.1 Aug 25, 2019

  • Fixes
    • Fix serialized LatLong data being loaded as strings (#712)
  • Documentation Changes
    • Fixed FAQ cell output (#710)

Thanks to the following people for contributing to this release:
@gsheni, @rwedge

featuretools - v0.10.0

Published by rwedge about 5 years ago

v0.10.0 Aug 19, 2019

The next non-bugfix release of Featuretools will not support Python 2

  • Enhancements
    • Give more frequent progress bar updates and update chunk size behavior (#631, #696)
    • Added drop_first as param in encode_features (#647)
    • Added support for stacking multi-output primitives (#679)
    • Generate transform features of direct features (#623)
    • Added serializing and deserializing from S3 and deserializing from URLs (#685)
    • Added nlp_primitives as an add-on library (#704)
    • Added AutoNormalize to Featuretools plugins (#699)
    • Added functionality for relative units (month/year) in Timedelta (#692)
    • Added categorical-encoding as an add-on library (#700)
  • Fixes
    • Fix performance regression in DFS (#637)
    • Fix deserialization of feature relationship path (#665)
    • Set index after adding ancestor relationship variables (#668)
    • Fix user-supplied variable_types modification in Entity init (#675)
    • Don't calculate dependencies of unnecessary features (#667)
    • Prevent normalize entity's new entity having same index as base entity (#681)
    • Update variable type inference to better check for string values (#683)
  • Changes
    • Moved dask, distributed imports (#634)
  • Documentation Changes
    • Miscellaneous changes (#641, #658)
    • Modified doc_string of top_n in encoding (#648)
    • Hyperlinked ComposeML (#653)
    • Added FAQ (#620, #677)
    • Fixed FAQ question with multiple question marks (#673)
  • Testing Changes
    • Add master, and release tests for premium primitives (#660, #669)
    • Miscellaneous changes (#672, #674)

Thanks to the following people for contributing to this release:
@alexjwang, @allisonportis, @ayushpatidar, @CJStadler, @ctduffy, @gsheni, @jeff-hernandez, @jeremyliweishih, @kmax12, @rwedge, @zhxt95

featuretools - v0.9.1

Published by rwedge over 5 years ago

v0.9.1 July 3, 2019

  • Enhancements
    • Speedup groupby transform calculations (#609)
    • Generate features along all paths when there are multiple paths between entities (#600, #608)
  • Fixes
    • Select columns of dataframe using a list (#615)
    • Change type of features calculated on Index features to Categorical (#602)
    • Filter dataframes through forward relationships (#625)
    • Specify Dask version in requirements for python 2 (#627)
    • Keep dataframe sorted by time during feature calculation (#626)
    • Fix bug in encode_features that created duplicate columns of
      features with multiple outputs (#622)
  • Changes
    • Remove unused variance_selection.py file (#613)
    • Remove Timedelta data param (#619)
    • Remove DaysSince primitive (#628)
  • Documentation Changes
    • Add installation instructions for add-on libraries (#617)
    • Miscellaneous changes (#632, #639)
  • Testing Changes
    • Miscellaneous changes (#595, #612)

Thanks to the following people for contributing to this release: @CJStadler, @gsheni, @kkleidal, @kmax12, @rwedge

featuretools - v0.9.0

Published by rwedge over 5 years ago

v0.9.0 June 19, 2019

  • Enhancements
    • Add unit parameter to timesince primitives (#558)
    • Add ability to install optional add on libraries (#551)
    • Load and save features from open files and strings (#566)
    • Support custom variable types (#571)
    • Support entitysets which have multiple paths between two entities (#572, #544)
    • Added show_info function, more output information added to CLI featuretools info (#525)
  • Fixes
    • Normalize_entity specifies error when 'make_time_index' is an invalid string (#550)
    • Schema version added for entityset serialization (#586)
    • Renamed features have names correctly serialized (#585)
    • Improved error message for index/time_index being the same column in normalize_entity and entity_from_dataframe (#583)
    • Removed all mentions of allow_where (#587, #588)
    • Removed unused variable in normalize entity (#589)
    • Change time since return type to numeric (#606)
  • Changes
    • Refactor get_pandas_data_slice to take single entity (#547)
    • Updates TimeSincePrevious and Diff Primitives (#561)
    • Remove unecessary time_last variable (#546)
  • Documentation Changes
    • Add Featuretools Enterprise to documentation (#563)
    • Miscellaneous changes (#552, #573, #577, #599)
  • Testing Changes
    • Miscellaneous changes (#559, #569, #570, #574, #584, #590)

Thanks to the following people for contributing to this release:
@alexjwang, @allisonportis, @CJStadler, @ctduffy, @gsheni, @kmax12, @rwedge

featuretools - v0.8.0

Published by rwedge over 5 years ago

v0.8.0 May 17, 2019

  • Rename NUnique to NumUnique (#510)
  • Serialize features as JSON (#532)
  • Drop all variables at once in normalize_entity (#533)
  • Remove unnecessary sorting from normalize_entity (#535)
  • Features cache their names (#536)
  • Only calculate features for instances before cutoff (#523)
  • Remove all relative imports (#530)
  • Added FullName Variable Type (#506)
  • Add error message when target entity does not exist (#520)
  • New demo links (#542)
  • Remove duplicate features check in DFS (#538)
  • featuretools_primitives entry point expects list of primitive classes (#529)
  • Update ALL_VARIABLE_TYPES list (#526)
  • More Informative N Jobs Prints and Warnings (#511)
  • Update sklearn version requirements (#541)
  • Update Makefile (#519)
  • Remove unused parameter in Entity._handle_time (#524)
  • Remove build_ext code from setup.py (#513)
  • Documentation updates (#512, #514, #515, #521, #522, #527, #545)
  • Testing updates (#509, #516, #517, #539)

Thanks to the following people for contributing to this release: @bphi, @CharlesBradshaw, @CJStadler, @glentennis, @gsheni, @kmax12, @rwedge

featuretools - v0.7.1

Published by rwedge over 5 years ago

v0.7.1 Apr 24, 2019

  • Automatically generate feature name for controllable primitives (#481)
  • Primitive docstring updates (#489, #492, #494, #495)
  • Change primitive functions that returned strings to return functions (#499)
  • CLI customizable via entrypoints (#493)
  • Improve calculation of aggregation features on grandchildren (#479)
  • Refactor entrypoints to use decorator (#483)
  • Include doctests in testing suite (#491)
  • Documentation updates (#490)
  • Update how standard primitives are imported internally (#482)

Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @glentennis, @gsheni, @jeff-hernandez, @kmax12, @minkvsky, @rwedge, @thehomebrewnerd

featuretools - v0.7.0

Published by rwedge over 5 years ago

v0.7.0 Mar 29, 2019

  • Improve Entity Set Serialization (#361)
  • Support calling a primitive instance's function directly (#461, #468)
  • Support other libraries extending featuretools functionality via entrypoints (#452)
  • Remove featuretools install command (#475)
  • Add GroupByTransformFeature (#455, #472, #476)
  • Update Haversine Primitive (#435, #462)
  • Add commutative argument to SubtractNumeric and DivideNumeric primitives (#457)
  • Add FilePath variable_type (#470)
  • Add PhoneNumber, DateOfBirth, URL variable types (#447)
  • Generalize infer_variable_type, convert_variable_data and convert_all_variable_data methods (#423)
  • Documentation updates (#438, #446, #458, #469)
  • Testing updates (#440, #444, #445, #459)

Thanks to the following people for contributing to this release: @bukosabino, @CharlesBradshaw, @ColCarroll, @glentennis, @grayskripko, @gsheni, @jeff-hernandez, @jrkinley, @kmax12, @RogerTangos, @rwedge

Breaking Changes

  • ft.dfs now has a groupby_trans_primitives parameter that DFS uses to automatically construct features that group by an ID column and then apply a transform primitive to search group. This change applies to the following primitives: CumSum, CumCount, CumMean, CumMin, and CumMax.

    Previous behavior

    .. code-block:: python

      ft.dfs(entityset=es,
             target_entity='customers',
             trans_primitives=["cum_mean"])
    

    New behavior

    .. code-block:: python

      ft.dfs(entityset=es,
             target_entity='customers',
             groupby_trans_primitives=["cum_mean"])
    
  • Related to the above change, cumulative transform features are now defined using a new feature class, GroupByTransformFeature.

    Previous behavior

    .. code-block:: python

      ft.Feature([base_feature, groupby_feature], primitive=CumulativePrimitive)
    

    New behavior

    .. code-block:: python

      ft.Feature(base_feature, groupby=groupby_feature, primitive=CumulativePrimitive)
    

Summary:

featuretools - v0.6.1

Published by rwedge over 5 years ago

v0.6.1 Feb 15, 2019

  • Cumulative primitives (#410)
  • Entity.query_by_values now preserves row order of underlying data (#428)
  • Implementing Country Code and Sub Region Codes as variable types (#430)
  • Added IPAddress and EmailAddress variable types (#426)
  • Install data and dependencies (#403)
  • Add TimeSinceFirst, fix TimeSinceLast (#388)
  • Allow user to pass in desired feature return types (#372)
  • Add new configuration object (#401)
  • Replace NUnique get_function (#434)
  • _calculate_idenity_features now only returns the features asked for, instead of the entire entity (#429)
  • Primitive function name uniqueness (#424)
  • Update NumCharacters and NumWords primitives (#419)
  • Removed Variable.dtype (#416, #433)
  • Change to zipcode rep, str for pandas (#418)
  • Remove pandas version upper bound (#408)
  • Make S3 dependencies optional (#404)
  • Check that agg_primitives and trans_primitives are right primitive type (#397)
  • Mean primitive changes (#395)
  • Fix transform stacking on multi-output aggregation (#394)
  • Fix list_primitives (#391)
  • Handle graphviz dependency (#389, #396, #398)
  • Testing updates (#402, #417, #433)
  • Documentation updates (#400, #409, #415, #417, #420, #421, #422, #431)

Thanks to the following people for contributing to this release: @CharlesBradshaw, @csala, @floscha, @gsheni, @jxwolstenholme, @kmax12, @RogerTangos, @rwedge

featuretools - v0.6.0

Published by rwedge over 5 years ago

v0.6.0 Jan 30, 2018

  • Primitive refactor (#364)
  • Mean ignore NaNs (#379)
  • Plotting entitysets (#382)
  • Add seed features later in DFS process (#357)
  • Multiple output column features (#376)
  • Add ZipCode Variable Type (#367)
  • Add primitive.get_filepath and example of primitive loading data from external files (#380)
  • Transform primitives take series as input (#385)
  • Update dependency requirements (#378, #383, #386)
  • Add modulo to override tests (#384)
  • Update documentation (#368, #377)
  • Update README.md (#366, #373)
  • Update CI tests (#359, #360, #375)

Thanks to the following people for contributing to this release: @floscha, @gsheni, @kmax12, @RogerTangos, @rwedge

featuretools - v0.5.1

Published by kmax12 almost 6 years ago

v0.5.1 Dec 17, 2018

  • Add missing dependencies (#353)
  • Move comment to note in documentation (#352)
featuretools - v0.5.0

Published by rwedge almost 6 years ago

v0.5.0 Dec 17, 2018

  • Add specific error for duplicate additional/copy_variables in normalize_entity (#348)
  • Removed EntitySet._import_from_dataframe (#346)
  • Removed time_index_reduce parameter (#344)
  • Allow installation of additional primitives (#326)
  • Fix DatetimeIndex variable conversion (#342)
  • Update Sklearn DFS Transformer (#343)
  • Clean up entity creation logic (#336)
  • remove casting to list in transform feature calculation (#330)
  • Fix sklearn wrapper (#335)
  • Add readme to pypi
  • Update conda docs after move to conda-forge (#334)
  • Add wrapper for scikit-learn Pipelines (#323)
  • Remove parse_date_cols parameter from EntitySet._import_from_dataframe (#333)

Thanks to the following people for contributing to this release: @bukosabino, @georgewambold, @gsheni, @jeff-hernandez, @kmax12, and @rwedge.

featuretools - v0.4.0

Published by rwedge almost 6 years ago

v0.4.0 Oct 31, 2018

  • Remove ft.utils.gen_utils.getsize and make pympler a test requirement (#299)
  • Update requirements.txt (#298)
  • Refactor EntitySet.find_path(...) (#295)
  • Clean up unused methods (#293)
  • Remove unused parents property of Entity (#283)
  • Removed relationships parameter (#284)
  • Improve time index validation (#285)
  • Encode features with "unknown" class in categorical (#287)
  • Allow where clauses on direct features in Deep Feature Synthesis (#279)
  • Change to fullargsspec (#288)
  • Parallel verbose fixes (#282)
  • Update tests for python 3.7 (#277)
  • Check duplicate rows cutoff times (#276)
  • Load retail demo data using compressed file (#271)
featuretools - v0.3.1

Published by rwedge about 6 years ago

v0.3.1 Sept 28, 2018

  • Handling time rewrite (#245)
  • Update deep_feature_synthesis.py (#249)
  • Handling return type when creating features from DatetimeTimeIndex (#266)
  • Update retail.py (#259)
  • Improve Consistency of Transform Primitives (#236)
  • Update demo docstrings (#268)
  • Handle non-string column names (#255)
  • Clean up merging of aggregation primitives (#250)
  • Add tests for Entity methods (#262)
  • Handle no child data when calculating aggregation features with multiple arguments (#264)
  • Add is_string utils function (#260)
  • Update python versions to match docker container (#261)
  • Handle where clause when no child data (#258)
  • No longer cache demo csvs, remove config file (#257)
  • Avoid stacking "expanding" primitives (#238)
  • Use randomly generated names in retail csv (#233)
  • Update README.md (#243)
featuretools - v0.3.0

Published by rwedge about 6 years ago

v0.3.0 Aug 27, 2018

  • Improve performance of all feature calculations (#224)
  • Update agg primitives to use more efficient functions (#215)
  • Optimize metadata calculation (#229)
  • More robust handling when no data at a cutoff time (#234)
  • Workaround categorical merge (#231)
  • Switch which CSV is associated with which variable (#228)
  • Remove unused kwargs from query_by_values, filter_and_sort (#225)
  • Remove convert_links_to_integers (#219)
  • Add conda install instructions (#223, #227)
  • Add example of using Dask to parallelize to docs (#221)
featuretools - v0.2.2

Published by rwedge about 6 years ago

v0.2.2 Aug 20, 2018

  • Remove unnecessary check no related instances call and refactor (#209)
  • Improve memory usage through support for pandas categorical types (#196)
  • Bump minimum pandas version from 0.20.3 to 0.23.0 (#216)
  • Better parallel memory warnings (#208, #214)
  • Update demo datasets (#187, #201, #207)
  • Make primitive lookup case insensitive (#213)
  • Use capital name (#211)
  • Set class name for Min (#206)
  • Remove variable_types from normalize entity (#205)
  • Handle parquet serialization with last time index (#204)
  • Reset index of cutoff times in calculate feature matrix (#198)
  • Check argument types for .normalize_entity (#195)
  • Type checking ignore entities. (#193)
featuretools - v0.2.1

Published by rwedge over 6 years ago

v0.2.1 July 2, 2018

  • Cpu count fix (#176)
  • Update flight (#175)
  • Move feature matrix calculation helper functions to separate file (#177)
featuretools - v0.2.0

Published by rwedge over 6 years ago

v0.2.0 June 22, 2018

  • Multiprocessing (#170)
  • Handle unicode encoding in repr throughout Featuretools (#161)
  • Clean up EntitySet class (#145)
  • Add support for building and uploading conda package (#167)
  • Parquet serialization (#152)
  • Remove variable stats (#171)
  • Make sure index variable comes first (#168)
  • No last time index update on normalize (#169)
  • Remove list of times as on option for cutoff_time in calculate_feature_matrix (#165)
  • Config does error checking to see if it can write to disk (#162)
Package Rankings
Top 19.62% on Anaconda.org
Top 0.99% on Pypi.org
Top 5.19% on Proxy.golang.org
Top 8.11% on Conda-forge.org
Related Projects