A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
LGPL-3.0 License
Bot releases are visible (Hide)
Published by rhiever about 7 years ago
TPOT now supports sparse matrices with a new built-in TPOT configurations, "TPOT sparse". We are using a custom OneHotEncoder implementation that supports missing values and continuous features.
We have added an "early stopping" option for stopping the optimization process if no improvement is made within a set number of generations. Look up the early_stop
parameter to access this functionality.
TPOT now reduces the number of duplicated pipelines between generations, which saves you time during the optimization process.
TPOT now supports custom scoring functions via the command-line mode.
We have added a new optional argument, periodic_checkpoint_folder
, that allows TPOT to periodically save the best pipeline so far to a local folder during optimization process.
TPOT no longer uses sklearn.externals.joblib
when n_jobs=1
to avoid the potential freezing issue that scikit-learn suffers from.
We have added pandas
as a dependency to read input datasets instead of numpy.recfromcsv
. NumPy's recfromcsv
function is unable to parse datasets with complex data types.
Fixed a bug that DEFAULT
in the parameter(s) of nested estimator raises KeyError
when exporting pipelines.
Fixed a bug related to setting random_state
in nested estimators. The issue would happen with pipeline with SelectFromModel
(ExtraTreesClassifier
as nested estimator) or StackingEstimator
if nested estimator has random_state
parameter.
Fixed a bug in the missing value imputation function in TPOT to impute along columns instead rows.
Refined input checking for sparse matrices in TPOT.
Published by rhiever over 7 years ago
TPOT now detects whether there are missing values in your dataset and replaces them with the median value of the column.
TPOT now allows you to set a group
parameter in the fit
function so you can use the GroupKFold cross-validation strategy.
TPOT now allows you to set a subsample ratio of the training instance with the subsample
parameter. For example, setting subsample
=0.5 tells TPOT to create a fixed subsample of half of the training data for the pipeline optimization process. This parameter can be useful for speeding up the pipeline optimization process, but may give less accurate performance estimates from cross-validation.
TPOT now has more built-in configurations, including TPOT MDR and TPOT light, for both classification and regression problems.
TPOTClassifier
and TPOTRegressor
now expose three useful internal attributes, fitted_pipeline_
, pareto_front_fitted_pipelines_
, and evaluated_individuals_
. These attributes are described in the API documentation.
Oh, TPOT now has thorough API documentation. Check it out!
Fixed a reproducibility issue where setting random_seed
didn't necessarily result in the same results every time. This bug was present since TPOT v0.7.
Refined input checking in TPOT.
Removed Python 2 uncompliant code.
Published by rhiever over 7 years ago
TPOT 0.7 is now out, featuring multiprocessing support for Linux and macOS, customizable operator configurations, and more.
TPOT now has multiprocessing support (Linux and macOS only). TPOT allows you to use multiple processes for accelerating pipeline optimization in TPOT with the n_jobs
parameter in both TPOTClassifier and TPOTRegressor.
TPOT now allows you to customize the operators and parameters explored during the optimization process. TPOT allows you to customize the list of operators and parameters in optimization process of TPOT with the config_dict
parameter. The format of this customized dictionary can be found in the online documentation.
TPOT now allows you to specify a time limit for evaluating a single pipeline (default limit is 5 minutes) in optimization process with the max_eval_time_mins
parameter, so TPOT won't spend hours evaluating overly-complex pipelines.
We tweaked TPOT's underlying evolutionary optimization algorithm to work even better, including using the mu+lambda algorithm. This algorithm gives you more control of how many pipelines are generated every iteration with the offspring_size
parameter.
Fixed a reproducibility issue where setting random_seed
didn't necessarily result in the same results every time. This bug was present since version 0.6.
Refined the default operators and parameters in TPOT, so TPOT 0.7 should work even better than 0.6.
TPOT now supports sample weights in the fitness function if some if your samples are more important to classify correctly than others. The sample weights option works the same as in scikit-learn, e.g., tpot.fit(x_train, y_train, sample_weights=sample_weights)
.
The default scoring metric in TPOT has been changed from balanced accuracy to accuracy, the same default metric for classification algorithms in scikit-learn. Balanced accuracy can still be used by setting scoring='balanced_accuracy'
when creating a TPOT instance.
Published by rhiever about 8 years ago
TPOTClassifier
and TPOTRegressor
classes to support classification and regression problems, respectively. The command-line interface also supports this feature through the -mode
parameter.max_time_mins
parameter, so you don't need to guess how long TPOT will take any more to recommend a pipeline to you.XGBoostClassifier
and XGBoostRegressor
in its pipelines.Published by rhiever about 8 years ago
After a couple months hiatus in refactor land, we're excited to release the latest and greatest version of TPOT v0.5. For the past couple months, we worked on heavily refactoring TPOT's code base from a hacky research demo into a more elegant code base that will be easier to maintain in the long run. As an added bonus, TPOT now directly optimizes over and exports to scikit-learn Pipeline objects, so your auto-generated code should be much more readable.
Major changes in v0.5:
cross_val_score
supports.Published by rhiever over 8 years ago
In TPOT 0.4, we've made some major changes to the internals of TPOT and added some convenience functions. We've summarized the changes below.
Published by rhiever over 8 years ago
Zenodo requires me to make a new release to assign a DOI, so here's that release. This is not a full release.
Published by rhiever over 8 years ago
This is the version of TPOT that was used in the GECCO 2016 paper, "Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science."
Published by rhiever almost 9 years ago
New in v0.2.0:
Published by rhiever almost 9 years ago