Bot releases are hidden (Show)

autogluon - v0.8.3 Latest Release

Published by suzhoum 6 months ago

What's Changed

v0.8.3 is a patch release to address security vulnerabilities.

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/0.8.2...0.8.3

This version supports Python versions 3.8, 3.9, and 3.10.

Changes

transformers and other packages version upgrades + some fixes: @suzhoum (#4155)

autogluon - v1.1.0

Published by Innixma 6 months ago

Version 1.1.0

We're happy to announce the AutoGluon 1.1 release.

AutoGluon 1.1 contains major improvements to the TimeSeries module, achieving a 60% win-rate vs AutoGluon 1.0 through the addition of Chronos, a pretrained model for time series forecasting, along with numerous other enhancements. The other modules have also been enhanced through new features such as Conv-LORA support and improved performance for large tabular datasets between 5 - 30 GB in size. For a full breakdown of AutoGluon 1.1 features, please refer to the feature spotlights and the itemized enhancements below.

Join the community:
Get the latest updates:

This release supports Python versions 3.8, 3.9, 3.10, and 3.11. Loading models trained on older versions of AutoGluon is not supported. Please re-train models using AutoGluon 1.1.

This release contains 125 commits from 20 contributors!

Full Contributor List (ordered by # of commits):

@shchur @prateekdesai04 @Innixma @canerturkmen @zhiqiangdon @tonyhoo @AnirudhDagar @Harry-zzh @suzhoum @FANGAreNotGnu @nimasteryang @lostella @dassaswat @afmkt @npepin-hub @mglowacki100 @ddelange @LennartPurucker @taoyang1122 @gradientsky

Special thanks to @ddelange for their continued assistance with Python 3.11 support and Ray version upgrades!

Spotlight

AutoGluon Achieves Top Placements in ML Competitions!

AutoGluon has experienced wide-spread adoption on Kaggle since the AutoGluon 1.0 release.
AutoGluon has been used in over 130 Kaggle notebooks and mentioned in over 100 discussion threads in the past 90 days!
Most excitingly, AutoGluon has already been used to achieve top ranking placements in multiple competitions with thousands of competitors since the start of 2024:

Placement	Competition	Author	Date	AutoGluon Details	Notes
🥉 Rank 3/2303 (Top 0.1%)	Steel Plate Defect Prediction	Samvel Kocharyan	2024/03/31	v1.0, Tabular	Kaggle Playground Series S4E3
🥈 Rank 2/93 (Top 2%)	Prediction Interval Competition I: Birth Weight	Oleksandr Shchur	2024/03/21	v1.0, Tabular
🥈 Rank 2/1542 (Top 0.1%)	WiDS Datathon 2024 Challenge #1	lazy_panda	2024/03/01	v1.0, Tabular
🥈 Rank 2/3746 (Top 0.1%)	Multi-Class Prediction of Obesity Risk	Kirderf	2024/02/29	v1.0, Tabular	Kaggle Playground Series S4E2
🥈 Rank 2/3777 (Top 0.1%)	Binary Classification with a Bank Churn Dataset	lukaszl	2024/01/31	v1.0, Tabular	Kaggle Playground Series S4E1
Rank 4/1718 (Top 0.2%)	Multi-Class Prediction of Cirrhosis Outcomes	Kirderf	2024/01/01	v1.0, Tabular	Kaggle Playground Series S3E26

We are thrilled that the data science community is leveraging AutoGluon as their go-to method to quickly and effectively achieve top-ranking ML solutions! For an up-to-date list of competition solutions using AutoGluon refer to our AWESOME.md, and don't hesitate to let us know if you use AutoGluon in a competition!

Chronos, a pretrained model for time series forecasting

AutoGluon-TimeSeries now features Chronos, a family of forecasting models pretrained on large collections of open-source time series datasets that can generate accurate zero-shot predictions for new unseen data. Check out the new tutorial to learn how to use Chronos through the familiar TimeSeriesPredictor API.

General

Refactor project README & project Tagline @Innixma (#3861, #4066)
Add AWESOME.md competition results and other doc improvements. @Innixma (#4023)
Pandas version upgrade. @shchur @Innixma (#4079, #4089)
PyTorch, CUDA, Lightning version upgrades. @prateekdesai04 @canerturkmen @zhiqiangdon (#3982, #3984, #3991, #4006)
Ray version upgrade. @ddelange @tonyhoo (#3774, #3956)
Scikit-learn version upgrade. @prateekdesai04 (#3872, #3881, #3947)
Various dependency upgrades. @Innixma @tonyhoo (#4024, #4083)

TimeSeries

Highlights

AutoGluon 1.1 comes with numerous new features and improvements to the time series module. These include highly requested functionality such as feature importance, support for categorical covariates, ability to visualize forecasts, and enhancements to logging. The new release also comes with considerable improvements to forecast accuracy, achieving 60% win rate and 3% average error reduction compared to the previous AutoGluon version. These improvements are mostly attributed to the addition of Chronos, improved preprocessing logic, and native handling of missing values.

New Features

Add Chronos pretrained forecasting model (tutorial). @canerturkmen @shchur @lostella (#3978, #4013, #4052, #4055, #4056, #4061, #4092, #4098)
Measure the importance of features & covariates on the forecast accuracy with TimeSeriesPredictor.feature_importance(). @canerturkmen (#4033, #4087)
Native missing values support (no imputation required). @shchur (#3995, #4068, #4091)
Add support for categorical covariates. @shchur (#3874, #4037)
Improve inference speed by persisting models in memory with TimeSeriesPredictor.persist(). @canerturkmen (#4005)
Visualize forecasts with TimeSeriesPredictor.plot(). @shchur (#3889)
Add RMSLE evaluation metric. @canerturkmen (#3938)
Enable logging to file. @canerturkmen (#3877)
Add option to keep lightning logs after training with keep_lightning_logs hyperparameter. @shchur (#3937)

Fixes and Improvements

Automatically preprocess real-valued covariates @shchur (#4042, #4069)
Add option to skip model selection when only one model is trained. @shchur (#4002)
Ensure all metrics handle missing values in target @shchur (#3966)
Fix bug when loading a GPU trained model on a CPU machine @shchur (#3979)
Fix inconsistent random seed. @canerturkmen @shchur (#3934, #4099)
Fix crash when calling .info after load. @afmkt (#3900)
Fix leaderboard crash when no models trained. @shchur (#3849)
Add prototype TabRepo simulation artifact generation. @shchur (#3829)
Fix refit_full bug. @shchur (#3820)
Documentation improvements, hide deprecated methods. @shchur (#3764, #4054, #4098)
Minor fixes. @canerturkmen, @shchur, @AnirudhDagar (#4009, #4040, #4041, #4051, #4070, #4094)

AutoMM

Highlights

AutoMM 1.1 introduces the innovative Conv-LoRA, a parameter-efficient fine-tuning (PEFT) method stemming from our latest paper presented at ICLR 2024, titled "Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model". Conv-LoRA is designed for fine-tuning the Segment Anything Model, exhibiting superior performance compared to previous PEFT approaches, such as LoRA and visual prompt tuning, across various semantic segmentation tasks in diverse domains including natural images, agriculture, remote sensing, and healthcare. Check out our Conv-LoRA example.

New Features

Added Conv-LoRA, a new parameter efficient fine-tuning method. @Harry-zzh @zhiqiangdon (#3933, #3999, #4007, #4022, #4025)
Added support for new column type: 'image_base64_str'. @Harry-zzh @zhiqiangdon (#3867)
Added support for loading pre-trained weights in FT-Transformer. @taoyang1122 @zhiqiangdon (#3859)

Fixes and Improvements

Fixed bugs in semantic segmentation. @Harry-zzh (#3801, #3812)
Fixed crashes when using F1 metric. @suzhoum (#3822)
Fixed bugs in PEFT methods. @Harry-zzh (#3840)
Accelerated object detection training by ~30% for the high_quality and best_quality presets. @FANGAreNotGnu (#3970)
Depreciated Grounding-DINO @FANGAreNotGnu (#3974)
Fixed lightning upgrade issues @zhiqiangdon (#3991)
Fixed using f1, f1_macro, f1_micro for binary classification in knowledge distillation. @nimasteryang (#3837)
Removed MyMuPDF from installation due to the license issue. Users need to install it by themselves to do document classification. @zhiqiangdon (#4093)

Tabular

Highlights

AutoGluon-Tabular 1.1 primarily focuses on bug fixes and stability improvements. In particular, we have greatly improved the runtime performance for large datasets between 5 - 30 GB in size through the usage of subsampling for decision threshold calibration and the weighted ensemble fitting to 1 million rows, maintaining the same quality while being far faster to execute. We also adjusted the default weighted ensemble iterations from 100 to 25, which will speedup all weighted ensemble fit times by 4x. We heavily refactored the fit_pseudolabel logic, and it should now achieve noticeably stronger results.

Fixes and Improvements

Fix return value in predictor.fit_weighted_ensemble(refit_full=True). @Innixma (#1956)
Enhance performance on large datasets through subsampling. @Innixma (#3977)
Fix refit_full crash when out of memory. @Innixma (#3977)
Refactor and enhance .fit_pseudolabel logic. @Innixma (#3930)
Fix crash in memory check during HPO for LightGBM, CatBoost, and XGBoost. @Innixma (#3931)
Fix dynamic stacking on windows. @Innixma (#3893)
LightGBM version upgrade. @mglowacki100, @Innixma (#3427)
Fix memory-safe sub-fits being skipped if Ray is not initialized. @LennartPurucker (#3868)
Logging improvements. @AnirudhDagar (#3873)
Hide deprecated methods. @Innixma (#3795)
Documentation improvements. @Innixma @AnirudhDagar (#2024, #3975, #3976, #3996)

Docs and CI

Add auto benchmarking report generation. @prateekdesai04 (#4038, #4039)
Fix tabular tests for Windows. @tonyhoo (#4036)
Fix hanging tabular unit tests. @prateekdesai04 (#4031)
Fix CI evaluation. @suzhoum (#4019)
Add package version comparison between CI runs @prateekdesai04 (#3962, #3968, #3972)
Update conf.py to reflect current year. @dassaswat (#3932)
Avoid redundant unit test runs. @prateekdesai04 (#3942)
Fix colab notebook links @prateekdesai04 (#3926)

New Contributors

@npepin-hub made their first contribution in https://github.com/autogluon/autogluon/pull/3898
@afmkt made their first contribution in https://github.com/autogluon/autogluon/pull/3900
@dassaswat made their first contribution in https://github.com/autogluon/autogluon/pull/3932
@nimasteryang made their first contribution in https://github.com/autogluon/autogluon/pull/3837
@zkalson made their first contribution in https://github.com/autogluon/autogluon/pull/4096

autogluon - v1.0.0

Published by Innixma 11 months ago

Version 1.0.0

Today is finally the day... AutoGluon 1.0 has arrived!! After over four years of development and 2061 commits from 111 contributors, we are excited to share with you the culmination of our efforts to create and democratize the most powerful, easy to use, and feature rich automated machine learning system in the world. AutoGluon 1.0 comes with transformative enhancements to predictive quality resulting from the combination of multiple novel ensembling innovations, spotlighted below. Besides performance enhancements, many other improvements have been made that are detailed in the individual module sections.

This release supports Python versions 3.8, 3.9, 3.10, and 3.11. Loading models trained on older versions of AutoGluon is not supported. Please re-train models using AutoGluon 1.0.

This release contains 223 commits from 17 contributors!

Special thanks to @LennartPurucker for leading development of dynamic stacking, @geoalgo for co-authoring TabRepo to enable Zeroshot-HPO, @ddelange for helping to add Python 3.11 support, and @mglowacki100 for providing numerous feedback and suggestions.

Full Contributor List (ordered by # of commits):

@shchur, @zhiqiangdon, @Innixma, @prateekdesai04, @FANGAreNotGnu, @yinweisu, @taoyang1122, @LennartPurucker, @Harry-zzh, @AnirudhDagar, @jaheba, @gradientsky, @melopeo, @ddelange, @tonyhoo, @canerturkmen, @suzhoum

Join the community:
Get the latest updates:

Spotlight: Tabular Performance Enhancements

AutoGluon 1.0 features major enhancements to predictive quality, establishing a new state-of-the-art in Tabular modeling. To the best of our knowledge, AutoGluon 1.0 marks the largest leap forward in the state-of-the-art for tabular data since the original AutoGluon paper from March 2020. The enhancements come primarily from two features: Dynamic stacking to mitigate stacked overfitting, and a new learned model hyperparameters portfolio via Zeroshot-HPO, obtained from the newly released TabRepo ensemble simulation library. Together, they lead to a 75% win-rate compared to AutoGluon 0.8 with faster inference speed, lower disk usage, and higher stability.

AutoML Benchmark Results

OpenML released the official 2023 AutoML Benchmark results on November 16th, 2023. Their results show AutoGluon 0.8 as the state-of-the-art in AutoML systems across a wide variety of tasks: "Overall, in terms of model performance, AutoGluon consistently has the highest average rank in our benchmark." We now showcase that AutoGluon 1.0 achieves far superior results even to AutoGluon 0.8!

Below is a comparison on the OpenML AutoML Benchmark across 1040 tasks. LightGBM, XGBoost, and CatBoost results were obtained via AutoGluon, and other methods are from the official AutoML Benchmark 2023 results. AutoGluon 1.0 has a 95%+ win-rate against traditional tabular models, including a 99% win-rate vs LightGBM and a 100% win-rate vs XGBoost. AutoGluon 1.0 has between an 82% and 94% win-rate against other AutoML systems. For all methods, AutoGluon is able to achieve >10% average loss improvement (Ex: Going from 90% accuracy to 91% accuracy is a 10% loss improvement). AutoGluon 1.0 achieves first place in 63% of tasks, with lightautoml having the second most at 12% (AutoGluon 0.8 previously took first place 48% of the time). AutoGluon 1.0 even achieves a 7.4% average loss improvement over AutoGluon 0.8!

Method	AG Winrate	AG Loss Improvement	Rescaled Loss	Rank	Champion
AutoGluon 1.0 (Best, 4h8c)	-	-	0.04	1.95	63%
lightautoml (2023, 4h8c)	84%	12.0%	0.2	4.78	12%
H2OAutoML (2023, 4h8c)	94%	10.8%	0.17	4.98	1%
FLAML (2023, 4h8c)	86%	16.7%	0.23	5.29	5%
MLJAR (2023, 4h8c)	82%	23.0%	0.33	5.53	6%
autosklearn (2023, 4h8c)	91%	12.5%	0.22	6.07	4%
GAMA (2023, 4h8c)	86%	15.4%	0.28	6.13	5%
CatBoost (2023, 4h8c)	95%	18.2%	0.28	6.89	3%
TPOT (2023, 4h8c)	91%	23.1%	0.4	8.15	1%
LightGBM (2023, 4h8c)	99%	23.6%	0.4	8.95	0%
XGBoost (2023, 4h8c)	100%	24.1%	0.43	9.5	0%
RandomForest (2023, 4h8c)	97%	25.1%	0.53	9.78	1%

Not only is AutoGluon more accurate in 1.0, it is also more stable thanks to our new usage of Ray subprocesses during low-memory training, resulting in 0 task failures on the AutoML Benchmark.

AutoGluon 1.0 is capable of achieving the fastest inference throughput of any AutoML system while still obtaining state-of-the-art results. By specifying the infer_limit fit argument, users can trade off between accuracy and inference speed to meet their needs.

As seen in the below plot, AutoGluon 1.0 sets the Pareto Frontier for quality and inference throughput, achieving Pareto Dominance compared to all other AutoML systems. AutoGluon 1.0 High achieves superior performance to AutoGluon 0.8 Best with 8x faster inference and 8x less disk usage!

AutoGluon 1.0 AutoML Benchmark Plot

You can get more details on the results here.

We would like to conclude this spotlight by thanking Pieter Gijsbers, Sébastien Poirier, Erin LeDell, Joaquin Vanschoren, and the rest of the AutoML Benchmark authors for their key role in providing a shared and extensive benchmark to monitor the progress of the AutoML field. Their support has been invaluable to the AutoGluon project's continued growth.

We would also like to thank Frank Hutter, who continues to be a leader within the AutoML field, for organizing the AutoML Conference in 2022 and 2023 to bring the community together to share ideas and align on a compelling vision.

We are excited to see what our users can accomplish with AutoGluon 1.0's enhanced performance. As always, we will continue to improve AutoGluon in future releases to push the boundaries of AutoML forward for all.

Spotlight: Newly Published Papers

AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting

We have published a paper on AutoGluon-TimeSeries at AutoML Conference 2023 (YouTube Video). In the paper, we benchmarked AutoGluon and popular open-source forecasting frameworks (including DeepAR, TFT, AutoARIMA, AutoETS, AutoPyTorch). AutoGluon produces SOTA results in point and probabilistic forecasting, and even achieves 65% win rate against the best-in-hindsight combination of models.

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

We have published a paper on Tabular Zeroshot-HPO ensembling simulation to arXiv (Paper Link, GitHub). This paper is key to achieving the performance improvements seen in AutoGluon 1.0, and we plan to continue to develop the code-base to support future enhancements.

XTab: Cross-table Pretraining for Tabular Transformers

We have published a paper on tabular Transformer pre-training at ICML 2023 (Paper Link, GitHub). In the paper we demonstrate state-of-the-art performance for tabular deep learning models, including being able to match the performance of XGBoost and LightGBM models. While the pre-trained transformer is not yet incorporated into AutoGluon, we plan to integrate it in a future release.

Learning Multimodal Data Augmentation in Feature Space

Our paper on learning multimodal data augmentation was accepted at ICLR 2023 (Paper Link, GitHub). This paper introduces a plug-and-play module to learn multimodal data augmentation in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that it can (1) improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data. This work is not yet incorporated into AutoGluon, but we plan to integrate it in a future release.

Data Augmentation for Object Detection via Controllable Diffusion Models

Our paper on generative object detection data augmentation has been accepted at WACV 2024 (Paper and Github link will be available soon). This paper proposes a data augmentation pipeline based on controllable diffusion models and CLIP, with visual prior generation to guide the generation and post-filtering by category-calibrated CLIP scores to control its quality. We demonstrate that the performance improves across various tasks and settings when using our augmentation pipeline with different detectors. Although diffusion models are currently not integrated into AutoGluon, we plan to incorporate the data augmentation techniques in a future release.

Adapting Image Foundation Models for Video Understanding

We have published a paper on how to efficiently adapt image foundation models for video understanding at ICLR 2023 (Paper Link, GitHub). This paper introduces spatial adaptation, temporal adaptation and joint adaptation to gradually equip a frozen image model with spatiotemporal reasoning capability. The proposed method achieves competitive or even better performance than traditional full finetuning while largely saving the training cost of large foundation models.

General

Highlights

Python 3.11 Support @ddelange @yinweisu (#3190)

Other Enhancements

Added system info logging utility @Innixma (#3718)

Dependency Updates

Upgraded torch to >=2.0,<2.2 @zhiqiangdon @yinweisu @shchur (#3404, #3587, #3588)
Upgraded numpy to >=1.21,<1.29 @prateekdesai04 (#3709)
Upgraded Pandas to >=2.0,<2.2 @yinweisu @tonyhoo @shchur (#3498)
Upgraded scikit-learn to >=1.3,<1.5 @yinweisu @tonyhoo @shchur (#3498)
Upgraded Pillow to >=10.0.1,<11 @jaheba (#3688)
Upgraded scipy to >=1.5.4,<1.13 @prateekdesai04 (#3709)
Upgraded LightGBM to >=3.3,<4.2 @mglowacki100 @prateekdesai04 @Innixma (#3427, #3709, #3733)
Upgraded XGBoost to >=1.6,<2.1 @Innixma (#3768)
Various minor dependency updates @jaheba (#3689)

Tabular

Highlights

AutoGluon 1.0 features major enhancements to predictive quality, establishing a new state-of-the-art in Tabular modeling. Refer to the spotlight section above for more details!

New Features

Added dynamic_stacking predictor fit argument to mitigate stacked overfitting @LennartPurucker @Innixma (#3616)
Added zeroshot-HPO learned portfolio as new hyperparameters for best_quality and high_quality presets. @Innixma @geoalgo (#3750)
Added experimental scikit-learn API compatible wrappers to TabularPredictor. You can access them via from autogluon.tabular.experimental import TabularClassifier, TabularRegressor. @Innixma (#3769)
Added predictor.model_failures() @Innixma (#3421)
Added enhanced FT-Transformer @taoyang1122 @Innixma (#3621, #3644, #3692)
Added predictor.simulation_artifact() to support integration with TabRepo @Innixma (#3555)

Performance Improvements

Enhanced FastAI model quality on regression via output clipping @LennartPurucker @Innixma (#3597)
Added Skip-connection Weighted Ensemble @LennartPurucker (#3598)
Fix memory leaks by using ray processes for sequential fitting @LennartPurucker (#3614)
Added dynamic parallel folds support to better utilize compute in low memory scenarios @yinweisu @Innixma (#3511)
Fixed linear model crashes during HPO and added search space for linear models @Innixma (#3571, #3720)

Other Enhancements

Multi-layer stacking now produces deterministic results @LennartPurucker (#3573)
Various model dependency updates @mglowacki100 (#3373)
Various code cleanup and logging improvements @Innixma (#3408, #3570, #3652, #3734)

Bug Fixes / Code and Doc Improvements

Fixed incorrect model memory usage calculation @Innixma (#3591)
Fixed infer_limit being used incorrectly when bagging @Innixma (#3467)
Fixed rare edge-case FastAI model crash @Innixma (#3416)
Various minor bug fixes @Innixma (#3418, #3480)

AutoMM

AutoGluon MultiModal (AutoMM) is designed to simplify the fine-tuning of foundation models for downstream applications with just three lines of code. It seamlessly integrates with popular model zoos such as HuggingFace Transformers, TIMM, and MMDetection, providing support for a diverse range of data modalities, including image, text, tabular, and document data, whether used individually or in combination.

New Features

Semantic Segmentation
- Introducing the new problem type semantic_segmentation, for fine-tuning Segment Anything Model (SAM) with three lines of code. @Harry-zzh @zhiqiangdon (#3645, #3677, #3697, #3711, #3722, #3728)
- Added comprehensive benchmarks from diverse domains, including natural images, agriculture, remote sensing, and healthcare.
- Utilizing parameter-efficient finetuning (PEFT) LoRA, showcasing consistent superior performance over alternatives (VPT, adaptor, BitFit, SAM-adaptor, and LST) in the extensive benchmarks.
- Added one semantic segmentation tutorial @zhiqiangdon (#3716).
- Using SAM-ViT Huge by default (GPU memory > 25GB required).
Few Shot Classification
- Added the new few_shot_classification problem type for training few shot classifiers on images or texts. @zhiqiangdon (#3662, #3681, #3695)
- Leveraging image/text foundation models to extract features and train SVM classifiers.
- Added one few shot classification tutorial. @zhiqiangdon (#3662)
Supported torch.compile for faster training (experimental and torch >=2.2 required) @zhiqiangdon (#3520).

Performance Improvements

Improved default image backbones, achieving a 100% win-rate on the image benchmark. @taoyang1122 (#3738)
Replaced MLPs with FT-Transformer as the default tabular backbones, resulting in a 67% win-rate on the text+tabular benchmark. @taoyang1122 (#3732)
Using both the improved default image backbones and FT-Transformer achieves a 62% win-rate on the text+tabular+image benchmark. @taoyang1122 (#3732, #3738)

Stability Enhancements

Enabled rigorous multi-GPU CI testing. @prateekdesai04 (#3566)
Fixed multi-GPU issues. @FANGAreNotGnu (#3617 #3665 #3684 #3691, #3639, #3618)

Enhanced Usability

Supported custom evaluation metrics, which allows defining custom metric object and passing it to the eval_metric argument. @taoyang1122 (#3548)
Supported multi-GPU training in notebooks (experimental). @zhiqiangdon (#3484)
Improved logging with system info. @zhiqiangdon (#3735)

Improved Scalability

The introduction of the new learner class design facilitates easier support for new tasks and data modalities within AutoMM, enhancing overall scalability. @zhiqiangdon (#3650, #3685, #3735)

Other Enhancements

Added the option hf_text.use_fast for customizing fast tokenizer usage in hf_text models. @zhiqiangdon (#3379)
Added fallback evaluation/validation metric, supporting f1_macro f1_micro, and f1_weighted. @FANGAreNotGnu (#3696)
Supported multi-GPU inference with the DDP strategy. @zhiqiangdon (#3445, #3451)
Upgraded torch to 2.0. @zhiqiangdon (#3404)
Upgraded lightning to 2.0 @zhiqiangdon (#3419)
Upgraded torchmetrics to 1.0 @zhiqiangdon (#3422)

Code Improvements

Refactored AutoMM with the learner class for improved design. @zhiqiangdon (#3650, #3685, #3735)
Refactored FT-Transformer. @taoyang1122 (#3621, #3700)
Refactored the visualizers of object detection, semantic segmentation, and NER. @zhiqiangdon (#3716)
Other code refactor/clean-up: @zhiqiangdon @FANGAreNotGnu (#3383 #3399 #3434 #3667 #3684 #3695)

Bug Fixes/Doc Improvements

Fixed HPO for focal loss. @suzhoum (#3739)
Fixed one ONNX export issue. @AnirudhDagar (#3725)
Improved AutoMM introduction for clarity. @zhiqiangdon (#3388 #3726)
Improved AutoMM API doc. @zhiqiangdon @AnirudhDagar (#3772 #3777)
Other bug fixes @zhiqiangdon @FANGAreNotGnu @taoyang1122 @tonyhoo @rsj123 @AnirudhDagar (#3384, #3424, #3526, #3593, #3615, #3638, #3674, #3693, #3702, #3690, #3729, #3736, #3474, #3456, #3590, #3660)
Other doc improvements @zhiqiangdon @FANGAreNotGnu @taoyang1122 (#3397, #3461, #3579, #3670, #3699, #3710, #3716, #3737, #3744, #3745, #3680)

TimeSeries

Highlights

AutoGluon 1.0 features numerous usability and performance improvements to the TimeSeries module. These include automatic handling of missing data and irregular time series, new forecasting metrics (including custom metric support), advanced time series cross-validation options, and new forecasting models.

New features

Support for custom forecasting metrics @shchur (#3760, #3602)
New forecasting metrics WAPE, RMSSE, SQL + improved documentation for metrics @melopeo @shchur (#3747, #3632, #3510, #3490)
Improved robustness: TimeSeriesPredictor can now handle data with all pandas frequencies, irregular timestamps, or missing values represented by NaN @shchur (#3563, #3454)
New models: intermittent demand forecasting models based on conformal prediction (ADIDA, CrostonClassic, CrostonOptimized, CrostonSBA, IMAPA); WaveNet and NPTS from GluonTS; new baseline models (Average, SeasonalAverage, Zero) @canerturkmen @shchur (#3706, #3742, #3606, #3459)
Advanced cross-validation options: avoid retraining the models for each validation window with refit_every_n_windows or adjust the step size between validation windows with val_step_size arguments to TimeSeriesPredictor.fit @shchur (#3704, #3537)

Enhancements

Enable Ray Tune for deep-learning forecasting models @canerturkmen (#3705)
Support passing multiple evaluation metrics to TimeSeriesPredictor.evaluate @shchur (#3646)
Static features can now be passed directly to TimeSeriesDataFrame.from_path and TimeSeriesDataFrame.from_data_frame constructors @shchur (#3635)

Performance improvements

Much more accurate forecasts at low time limits thanks to new presets and updated logic for splitting the training time across models @shchur (#3749, #3657, #3741)
Faster training and prediction + lower memory usage for DirectTabular and RecursiveTabular models (#3740, #3620, #3559)
Enable early stopping and improve inference speed for GluonTS models @shchur (#3575)
Reduce import time for autogluon.timeseries by moving import statements inside model classes (#3514)

Bug Fixes / Code and Doc Improvements

Improve log messages @shchur (#3721)
Add reference to the publication on AutoGluon-TimeSeries to README @shchur (#3482)
Align API of TimeSeriesPredictor with TabularPredictor, remove deprecated methods @shchur (#3714, #3655, #3396)
General bug fixes and improvements @shchur (#3758, #3756, #3755, #3754, #3746, #3743, #3727, #3698, #3654, #3653, #3648, #3628, #3588, #3560, #3558, #3536, #3533, #3523, #3522, #3476, #3463)

EDA

The EDA module will be released at a later time, as it requires additional development effort before it is ready for 1.0. We will make an announcement when EDA is ready for release. For now, please continue to use "autogluon.eda==0.8.2".

Deprecations

General

autogluon.core.spaces has been deprecated. Please use autogluon.common.spaces instead @Innixma (#3701)

Tabular

Tabular will log warnings if using the deprecated methods. Deprecated methods are planned to be removed in AutoGluon 1.2 @Innixma (#3701)

autogluon.tabular.TabularPredictor
- predictor.get_model_names() -> predictor.model_names()
- predictor.get_model_names_persisted() -> predictor.model_names(persisted=True)
- predictor.compile_models() -> predictor.compile()
- predictor.persist_models() -> predictor.persist()
- predictor.unpersist_models() -> predictor.unpersist()
- predictor.get_model_best() -> predictor.model_best
- predictor.get_pred_from_proba() -> predictor.predict_from_proba()
- predictor.get_oof_pred_proba() -> predictor.predict_proba_oof()
- predictor.get_oof_pred() -> predictor.predict_oof()
- predictor.get_model_full_dict() -> predictor.model_refit_map()
- predictor.get_size_disk() -> predictor.disk_usage()
- predictor.get_size_disk_per_file() -> predictor.disk_usage_per_file()
- predictor.leaderboard() silent argument deprecated, replaced by display, defaults to False
  - Same for predictor.evaluate() and predictor.evaluate_predictions()

AutoMM

Deprecated the FewShotSVMPredictor in favor of the new few_shot_classification problem type @zhiqiangdon (#3699)
Deprecated the AutoMMPredictor in favor of MultiModalPredictor @zhiqiangdon (#3650)
autogluon.multimodal.MultiModalPredictor
- Deprecated the config argument in the fit API. @zhiqiangdon (#3679)
- Deprecated the init_scratch and pipeline arguments in the init API @zhiqiangdon (#3668)

TimeSeries

autogluon.timeseries.TimeSeriesPredictor
- Deprecated argument TimeSeriesPredictor(ignore_time_index: bool). Now, if the data contains irregular timestamps, either convert it to regular frequency with data = data.convert_frequency(freq) or provide frequency when creating the predictor as TimeSeriesPredictor(freq=freq).
- predictor.evaluate() now returns a dictionary (previously returned a float)
- predictor.score() -> predictor.evaluate()
- predictor.get_model_names() -> predictor.model_names()
- predictor.get_model_best() -> predictor.model_best
- Metric "mean_wQuantileLoss" has been renamed to "WQL"
- predictor.leaderboard() silent argument deprecated, replaced by display, defaults to False
- When setting hyperparameters to a string in predictor.fit(), supported values are now "default", "light" and "very_light"
autogluon.timeseries.TimeSeriesDataFrame
- df.to_regular_index() -> df.convert_frequency()
- Deprecated method df.get_reindexed_view(). Please see deprecation notes for ignore_time_index under TimeSeriesPredictor above for information on how to deal with irregular timestamps

Models
- All models based on MXNet (DeepARMXNet, MQCNNMXNet, MQRNNMXNet, SimpleFeedForwardMXNet, TemporalFusionTransformerMXNet, TransformerMXNet) have been removed
- Statistical models from Statmodels (ARIMA, Theta, ETS) have been replaced by their counterparts from StatsForecast (#3513). Note that these models now have different hyperparameter names.
- DirectTabular is now implemented using mlforecast backend (same as RecursiveTabular), most hyperparameter names for the model have changed.
autogluon.timeseries.TimeSeriesEvaluator has been deprecated. Please use metrics available in autogluon.timeseries.metrics instead.
autogluon.timeseries.splitter.MultiWindowSplitter and autogluon.timeseries.splitter.LastWindowSplitter have been deprecated. Please use num_val_windows and val_step_size arguments to TimeSeriesPredictor.fit instead (alternatively, use autogluon.timeseries.splitter.ExpandingWindowSplitter).

autogluon - v0.8.2

Published by yinweisu over 1 year ago

Version 0.8.2

v0.8.2 is a hot-fix release to pin pydantic version to avoid crashing during HPO

As always, only load previously trained models using the same version of AutoGluon that they were originally trained on.
Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/0.8.1...0.8.2

This version supports Python versions 3.8, 3.9, and 3.10.

Changes

codespell: action, config + some typos fixed @yarikoptic @yinweisu (#3323)
Unpin sentencepiece @zhiqiangdon (#3368)
Pin pydantic @yinweisu (3370)

autogluon - v0.8.1

Published by yinweisu over 1 year ago

Version 0.8.1

v0.8.1 is a bug fix release.

As always, only load previously trained models using the same version of AutoGluon that they were originally trained on.
Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/0.8.0...0.8.1

This version supports Python versions 3.8, 3.9, and 3.10.

Changes

Documentation improvements

Update google analytics property @gidler (#3330)
Add Discord Link @Innixma (#3332)
Add community section to website front page @Innixma (#3333)
Update Windows Conda install instructions @gidler (#3346)
Add some missing Colab buttons in tutorials @gidler (#3359)

Bug Fixes / General Improvements

Move PyMuPDF to optional @Innixma @zhiqiangdon (#3331)
Remove TIMM in core setup @Innixma (#3334)
Update persist_models max_memory 0.1 -> 0.4 @Innixma (#3338)
Lint modules @yinweisu (#3337, #3339, #3344, #3347)
Remove fairscale @zhiqiangdon (#3342)
Fix refit crash @Innixma (#3348)
Fix DirectTabular model failing for some metrics; hide warnings produced by AutoARIMA @shchur (#3350)
Pin dependencies @yinweisu (#3358)
Reduce per gpu batch size for AutoMM high_quality_hpo to avoid out of memory error for some corner cases @zhiqiangdon (#3360)
Fix HPO crash by setting reuse_actor to False @yinweisu (#3361)

autogluon - v0.8.0

Published by yinweisu over 1 year ago

Version 0.8.0

We're happy to announce the AutoGluon 0.8 release.

NEW: Join our official community discord server to ask questions and get involved!

Note: Loading models trained in different versions of AutoGluon is not supported.

This release contains 196 commits from 20 contributors!

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/0.7.0...0.8.0

Special thanks to @geoalgo for the joint work in generating the experimental tabular Zeroshot-HPO portfolio this release!

Full Contributor List (ordered by # of commits):

@shchur, @Innixma, @yinweisu, @gradientsky, @FANGAreNotGnu, @zhiqiangdon, @gidler, @liangfu, @tonyhoo, @cheungdaven, @cnpgs, @giswqs, @suzhoum, @yongxinw, @isunli, @jjaeyeon, @xiaochenbin9527, @yzhliu, @jsharpna, @sxjscience

AutoGluon 0.8 supports Python versions 3.8, 3.9, and 3.10.

Changes

Highlights

AutoGluon TimeSeries introduced several major improvements, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.
AutoGluon Tabular now supports calibrating the decision threshold in binary classification (API), leading to massive improvements in metrics such as f1 and balanced_accuracy. It is not uncommon to see f1 scores improve from 0.70 to 0.73 as an example. We strongly encourage all users who are using these metrics to try out the new decision threshold calibration logic.
AutoGluon MultiModal introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection.
AutoGluon MultiModal upgraded the presets for object detection, now offering medium_quality, high_quality, and best_quality options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.
AutoGluon Tabular has added an experimental Zeroshot HPO config which performs well on small datasets <10000 rows when at least an hour of training time is provided (~60% win-rate vs best_quality). To try it out, specify presets="experimental_zeroshot_hpo_hybrid" when calling fit().
AutoGluon EDA added support for Anomaly Detection and Partial Dependence Plots.
AutoGluon Tabular has added experimental support for TabPFN, a pre-trained tabular transformer model. Try it out via pip install autogluon.tabular[all,tabpfn] (hyperparameter key is "TABPFN")! You can also try it out via specifying presets="experimental_extreme_quality".

General

General doc improvements @tonyhoo @Innixma @yinweisu @gidler @cnpgs @isunli @giswqs (#2940, #2953, #2963, #3007, #3027, #3059, #3068, #3083, #3128, #3129, #3130, #3147, #3174, #3187, #3256, #3258, #3280, #3306, #3307, #3311, #3313)
General code fixes and improvements @yinweisu @Innixma (#2921, #3078, #3113, #3140, #3206)
CI improvements @yinweisu @gidler @yzhliu @liangfu @gradientsky (#2965, #3008, #3013, #3020, #3046, #3053, #3108, #3135, #3159, #3283, #3185)
New AutoGluon Webpage @gidler @shchur (#2924)
Support sample_weight in RMSE @jjaeyeon (#3052)
Move AG search space to common @yinweisu (#3192)
Deprecation utils @yinweisu (#3206, #3209)
Update namespace packages for PEP420 compatibility @gradientsky (#3228)

Multimodal

AutoGluon MultiModal (also known as AutoMM) introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection. Additionally, we have upgraded the presets for object detection, now offering medium_quality, high_quality, and best_quality options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.

New Features

PDF Document Classification. See tutorial @cheungdaven (#2864, #3043)
Open Vocabulary Object Detection. See tutorial @FANGAreNotGnu (#3164)

Performance Improvements

Upgrade the detection engine from mmdet 2.x to mmdet 3.x, and upgrade our presets @FANGAreNotGnu (#3262)
- medium_quality: yolo-s -> yolox-l
- high_quality: yolox-l -> DINO-Res50
- best_quality: yolox-x -> DINO-Swin_l
Speedup fusion model training with deepspeed strategy. @liangfu (#2932)
Enable detection backbone freezing to boost finetuning speed and save GPU usage @FANGAreNotGnu (#3220)

Other Enhancements

Support passing data path to the fit() API @zhiqiangdon (#3006)
Upgrade TIMM to the latest v0.9.* @zhiqiangdon (#3282)
Support xywh output for object detection @FANGAreNotGnu (#2948)
Fusion model inference acceleration with TensorRT @liangfu (#2836, #2987)
Support customizing advanced image data augmentation. Users can pass a list of torchvision transform objects as image augmentation. @zhiqiangdon (#3022)
Add yoloxm and yoloxtiny @FangAreNotGnu (#3038)
Add MultiImageMix Dataset for Object Detection @FangAreNotGnu (#3094)
Support loading specific checkpoints. Users can load the intermediate checkpoints other than model.ckpt and last.ckpt. @zhiqiangdon (#3244)
Add some predictor properties for model statistics @zhiqiangdon (#3289)
- trainable_parameters returns the number of trainable parameters.
- total_parameters returns the number of total parameters.
- model_size returns the model size measured by megabytes.

Bug Fixes / Code and Doc Improvements

General bug fixes and improvements @zhiqiangdon @liangfu @cheungdaven @xiaochenbin9527 @Innixma @FANGAreNotGnu @gradientsky @yinweisu @yongxinw (#2939, #2989, #2983, #2998, #3001, #3004, #3006, #3025, #3026, #3048, #3055, #3064, #3070, #3081, #3090, #3103, #3106, #3119, #3155, #3158, #3167, #3180, #3188, #3222, #3261, #3266, #3277, #3279, #3261, #3267)
General doc improvements @suzhoum (#3295, #3300)
Remove clip from fusion models @liangfu (#2946)
Refactor inferring problem type and output shape @zhiqiangdon (#3227)
Log GPU info including GPU total memory, free memory, GPU card name, and CUDA version during training @zhiqaingdon (#3291)

Tabular

New Features

Added calibrate_decision_threshold (tutorial), which allows to optimize a given metric's decision threshold for predictions to strongly enhance the metric score. @Innixma (#3298)
We've added an experimental Zeroshot HPO config, which performs well on small datasets <10000 rows when at least an hour of training time is provided. To try it out, specify presets="experimental_zeroshot_hpo_hybrid" when calling fit() @Innixma @geoalgo (#3312)
The TabPFN model is now supported as an experimental model. TabPFN is a viable model option when inference speed is not a concern, and the number of rows of training data is less than 10,000. Try it out via pip install autogluon.tabular[all,tabpfn]! @Innixma (#3270)
Backend support for distributed training, which will be available with the next Cloud module release. @yinweisu (#3054, #3110, #3115, #3131, #3142, #3179, #3216)

Performance Improvements

Accelerate boolean preprocessing @Innixma (#2944)

Other Enhancements

Add quantile regression support for CatBoost @shchur (#3165)
Implement quantile regression for LGBModel @shchur (#3168)
Log to file support @yinweisu (#3232)
Add support for included_model_types @yinweisu (#3239)
Add enable_categorical=True support to XGBoost @Innixma (#3286)

Bug Fixes / Code and Doc Improvements

Cross-OS loading of a fit TabularPredictor should now work properly @yinweisu @Innixma
General bug fixes and improvements @Innixma @cnpgs @shchur @yinweisu @gradientsky (#2865, #2936, #2990, #3045, #3060, #3069, #3148, #3182, #3199, #3226, #3257, #3259, #3268, #3269, #3287, #3288, #3285, #3293, #3294, #3302)
Move interpretable logic to InterpretableTabularPredictor @Innixma (#2981)
Enhance drop_duplicates, enable by default @Innixma (#3010)
Refactor params_aux & memory checks @Innixma (#3033)
Raise regression pred_proba @Innixma (#3240)

TimeSeries

In v0.8 we introduce several major improvements to the Time Series module, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.

Highlights

New models: PatchTST and DLinear from GluonTS, and RecursiveTabular based on integration with the mlforecast library @shchur (#3177, #3184, #3230)
Improved accuracy and reduced overall training time thanks to updated presets @shchur (#3281, #3120)
3-6x faster training and inference for AutoARIMA, AutoETS, Theta, DirectTabular, WeightedEnsemble models @shchur (#3062, #3214, #3252)

New Features

Dramatically faster repeated calls to predict(), leaderboard() and evaluate() thanks to prediction caching @shchur (#3237)
Reduce overfitting by using multiple validation windows with the num_val_windows argument to fit() @shchur (#3080)
Exclude certain models from presets with the excluded_model_types argument to fit() @shchur (#3231)
New method refit_full() that refits models on combined train and validation data @shchur (#3157)
Train multiple configurations of the same model by providing lists in the hyperparameters argument @shchur (#3183)
Time limit set by time_limit is now respected by all models @shchur (#3214)

Enhancements

Improvements to the DirectTabular model (previously called AutoGluonTabular): faster featurization, trained as a quantile regression model if eval_metric is set to "mean_wQuantileLoss" @shchur (#2973, #3211)
Use correct seasonal period when computing the MASE metric @shchur (#2970)
Check the AutoGluon version when loading TimeSeriesPredictor from disk @shchur (#3233)

Minor Improvements / Documentation / Bug Fixes

Update documentation and tutorials @shchur (#2960, #2964, #3296, #3297)
General bug fixes and improvements @shchur (#2977, #3058, #3066, #3160, #3193, #3202, #3236, #3255, #3275, #3290)

Exploratory Data Analysis (EDA) tools

In 0.8 we introduce a few new tools to help with data exploration and feature engineering:

Anomaly Detection @gradientsky (#3124, #3137) - helps to identify unusual patterns or behaviors in data that deviate significantly from the norm. It's best used when finding outliers, rare events, or suspicious activities that could indicate fraud, defects, or system failures. Check the Anomaly Detection Tutorial to explore the functionality.
Partial Dependence Plots @gradientsky (#3071, #3079) - visualize the relationship between a feature and the model's output for each individual instance in the dataset. Two-way variant can visualize potential interactions between any two features. Please see this tutorial for more detail: Using Interaction Charts To Learn Information About the Data

Bug Fixes / Code and Doc Improvements

Switch regression analysis in quick_fit to use residuals plot @gradientsky (#3039)
Added explain_rows method to autogluon.eda.auto - Kernel SHAP visualization @gradientsky (#3014)
General improvements and fixes @gradientsky (#2991, #3056, #3102, #3107, #3138)

autogluon - v0.7.0

Published by tonyhoo over 1 year ago

Version 0.7.0

We're happy to announce the AutoGluon 0.7 release. This release contains a new experimental module autogluon.eda for exploratory
data analysis. AutoGluon 0.7 offers conda-forge support, enhancements to Tabular, MultiModal, and Time Series
modules, and many quality of life improvements and fixes.

As always, only load previously trained models using the same version of AutoGluon that they were originally trained on.
Loading models trained in different versions of AutoGluon is not supported.

This release contains 170 commits from 19 contributors!

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/v0.6.2...v0.7.0

Special thanks to @MountPOTATO who is a first time contributor to AutoGluon this release!

Full Contributor List (ordered by # of commits):

@Innixma, @zhiqiangdon, @yinweisu, @gradientsky, @shchur, @sxjscience, @FANGAreNotGnu, @yongxinw, @cheungdaven,
@liangfu, @tonyhoo, @bryanyzhu, @suzhoum, @canerturkmen, @giswqs, @gidler, @yzhliu, @Linuxdex and @MountPOTATO

AutoGluon 0.7 supports Python versions 3.8, 3.9, and 3.10. Python 3.7 is no longer supported as of this release.

Changes

NEW: AutoGluon available on conda-forge

As of AutoGluon 0.7 release, AutoGluon is now available on conda-forge (#612)!

Kudos to the following individuals for making this happen:

@giswqs for leading the entire effort and being a 1-man army driving this forward.
@h-vetinari for providing excellent advice for working with conda-forge and some truly exceptional feedback.
@arturdaraujo, @PertuyF, @ngam and @priyanga24 for their encouragement, suggestions, and feedback.
The conda-forge team for their prompt and effective reviews of our (many) PRs.
@gradientsky for testing M1 support during the early stages.
@sxjscience, @zhiqiangdon, @canerturkmen, @shchur, and @Innixma for helping upgrade our downstream dependency versions to be compatible with conda.
Everyone else who has supported this process either directly or indirectly.

NEW: `autogluon.eda` (Exploratory Data Analysis)

We are happy to announce AutoGluon Exploratory Data Analysis (EDA) toolkit. Starting with v0.7, AutoGluon now can analyze and visualize different aspects of data and models. We invite you to explore the following tutorials: Quick Fit, Dataset Overview, Target Variable Analysis, Covariate Shift Analysis. Other materials can be found in EDA Section of the website.

General

Added Python 3.10 support. @Innixma (#2721)
Dropped Python 3.7 support. @Innixma (#2722)
Removed dask and distributed dependencies. @Innixma (#2691)
Removed autogluon.text and autogluon.vision modules. We recommend using autogluon.multimodal for text and vision tasks going forward.

AutoMM

AutoGluon MultiModal (a.k.a AutoMM) supports three new features: 1) document classification; 2) named entity recognition
for Chinese language; 3) few shot learning with SVM

Meanwhile, we removed autogluon.text and autogluon.vision as these features are supported in autogluon.multimodal

New features

Document Classification
- Add scanned document classification (experimental).
- Customers can train models for scanned document classification in a few lines of codes
- See tutorials
- Contributors and commits: @cheungdaven (#2765, #2826, #2833, #2928)
NER for Chinese Language
- Support Chinese named entity recognition
- See tutorials
- Contributors and commits: @cheungdaven (#2676, #2709)
Few Shot Learning with SVM
- Improved few shot learning by adding SVM support
- See tutorials
- Contributors and commits: @yongxinw (#2850)

Other Enhancements

Add new loss function FocalLoss. @yongxinw (#2860)
Add matcher realtime inference support. @zhiqiangdon (#2613)
Add matcher HPO. @zhiqiangdon (#2619)
Add YOLOX models (small, large, and x-large) and update presets for object detection. @FANGAreNotGnu (#2644, #2867, #2927, #2933)
Add AutoMM presets @zhiqiangdon. (#2620, #2749, #2839)
Add model dump for models from HuggingFace, timm and mmdet. @suzhoum @FANGAreNotGnu @liangfu (#2682, #2700, #2737, #2840)
Bug fix / refactor for NER. @cheungdaven (#2659, #2696, #2759, #2773)
MultiModalPredictor import time reduction. @sxjscience (#2718)

Bug Fixes / Code and Doc Improvements

NER example with visualization. @sxjscience (#2698)
Bug fixes / Code and Doc Improvements. @sxjscience @tonyhoo @giswqs (#2708, #2714, #2739, #2782, #2787, #2857, #2818, #2858, #2859, #2891, #2918, #2940, #2906, #2907)
Support of Label-Studio file export in AutoMM and added examples. @MountPOTATO (#2615)
Added example of few-shot memory bank model with feature extraction based on Tip-adapter. @Linuxdex (#2822)

Deprecations

autogluon.vision namespace is deprecated. @bryanyzhu (#2790, #2819, #2832)
autogluon.text namespace is deprecated. @sxjscience @Innixma (#2695, #2847)

Tabular

TabularPredictor’s inference speed has been heavily optimized, with an average 250% speedup for real-time inference. This means that TabularPredictor can satisfy <10 ms end-to-end latency on many datasets when using infer_limit, and the high_quality preset can satisfy <100 ms end-to-end latency on many datasets by default.
TabularPredictor’s "multimodal" hyperparameter preset now leverages the full capabilities of MultiModalPredictor, resulting in stronger performance on datasets containing a mix of tabular, image, and text features.

Performance Improvements

Upgraded versions of all dependency packages to use the latest releases. @Innixma (#2823, #2829, #2834, #2887, #2915)
Accelerated ensemble inference speed by 150% by removing TorchThreadManager context switching. @liangfu (#2472)
Accelerated FastAI neural network inference speed by 100x+ and training speed by 10x on datasets with many features. @Innixma (#2909)
(From 0.6.1) Avoid unnecessary DataFrame copies to accelerate feature preprocessing by 25%. @liangfu (#2532)
(From 0.6.1) Refactor NN_TORCH model to be dataset iterable, leading to a 100% inference speedup. @liangfu (#2395)
MultiModalPredictor is now used as a member of the ensemble when TabularPredictor.fit is passed hyperparameters="multimodal". @Innixma (#2890)

API Enhancements

Added predict_multi and predict_proba_multi methods to TabularPredictor to efficiently get predictions from multiple models. @Innixma (#2727)
Allow label column to not be present in leaderboard calls when scoring is disabled. @Innixma (#2912)

Deprecations

Added a deprecation warning when calling predict_proba with problem_type="regression". This will raise an exception in a future release. @Innixma (#2684)

Bug Fixes / Doc Improvements

Fixed incorrect time_limit estimation in NN_TORCH model. @Innixma (#2909)
Fixed error when fitting with only text features. @Innixma (#2705)
Fixed error when calibrate=True, use_bag_holdout=True in TabularPredictor.fit. @Innixma (#2715)
Fixed error when tuning n_estimators with RandomForest / ExtraTrees models. @Innixma (#2735)
Fixed missing onnxruntime dependency on Linux/MacOS when installing optional dependency skl2onnx. @liangfu (#2923)
Fixed edge-case RandomForest error on Windows. @yinweisu (#2851)
Added improved logging for refit_full. @Innixma (#2913)
Added compile_models to the deployment tutorial. @liangfu (#2717)
Various internal code refactoring. @Innixma (#2744, #2887)
Various doc and logging improvements. @Innixma (#2668)

autogluon.timeseries

New features

TimeSeriesPredictor now supports past covariates (a.k.a.dynamic features or related time series which is not known for time steps to be predicted). @shchur (#2665, #2680)
New models from StatsForecast got introduced in TimeSeriesPredictor for various presets (medium_quality, high_quality and best_quality). @shchur (#2758)
Support missing value imputation for TimeSeriesDataFrame which allows users to customize filling logics for missing values and fill gaps in an irregular sampled times series. @shchur (#2781)
Improve quantile forecasting performance of the AutoGluon-Tabular forecaster using the empirical noise distribution. @shchur (#2740)

Bug Fixes / Doc Improvements

Bug fixes and code improvements. @shchur @canerturkmen (#2703, #2712, #2713, #2769, #2771, #2816, #2817, #2875, #2877, #2919)
Doc improvements. @shchur @gidler (#2772, #2783, #2800)

autogluon - v0.6.2

Published by tonyhoo almost 2 years ago

Version 0.6.2

v0.6.2 is a security and bug fix release.

As always, only load previously trained models using the same version of AutoGluon that they were originally trained on.
Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/v0.6.1...v0.6.2

Special thanks to @daikikatsuragawa and @yzhliu who were first time contributors to AutoGluon this release!

This version supports Python versions 3.7 to 3.9. 0.6.x are the last releases that will support Python 3.7.

Changes

Documentation improvements

Ray usage FAQ (#2559) - @yinweisu
Fix missing Predictor API doc (#2573) - @gidler
2023 Roadmap Update (#2590) - @Innixma
Image classifiction tutorial update for bytearray (#2598) - @suzhoum
Fix broken tutorial index links (#2617) - @shchur
Improve timeseries quickstart tutorial (#2653) - @shchur

Bug Fixes / Security

[multimodal] Refactoring and bug fixes(#2554, #2541, #2477, #2569, #2578, #2613, #2620, #2630, #2633, #2635, #2647, #2645, #2652, #2659) - @zhiqiangdon, @yongxinw, @FANGAreNotGnu, @sxjscience, @Innixma
[multimodal] Support of named entity recognition (#2556) - @cheungdaven
[multimodal] bytearray support for image modality (#2495) - @suzhoum
[multimodal] Support HPO for matcher (#2619) - @zhiqiangdon
[multimodal] Support Onnx export for timm image model (#2564) - @liangfu
[tabular] Refactoring and bug fixes (#2387, #2595，#2599, #2589, #2628, #2376, #2642, #2646, #2650, #2657) - @Innixma, @liangfu， @yzhliu, @daikikatsuragawa, @yinweisu
[tabular] Fix ensemble folding (#2582) - @yinweisu
[tabular] Convert ColumnTransformer in tabular NN from sklearn to onnx (#2503) - @liangfu
[tabular] Throw error on non-finite values in label column ($2509) - @gidler
[timeseries] Refactoring and bug fixes (#2584, #2594, #2605, #2606) - @shchur
[timeseries] Speed up data preparation for local models (#2587) - @shchur
[timeseries] Spped up prediction for GluonTS models (#2593) - @shchur
[timeseries] Speed up the train/val splitter (#2586) - @shchur
[timeseries] Speed up TimeSeriesEnsembleSelection.fit (#2602) - @shchur
[security] Update torch (#2588) - @gradientsky

autogluon - v0.6.1

Published by gradientsky almost 2 years ago

Version 0.6.1

v0.6.1 is a security fix / bug fix release.

As always, only load previously trained models using the same version of AutoGluon that they were originally trained on.
Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/v0.6.0...v0.6.1

Special thanks to @lvwerra who is first time contributors to AutoGluon this release!

This version supports Python versions 3.7 to 3.9. 0.6.x are the last releases that will support Python 3.7.

Changes

Documentation improvements

Fix object detection tutorial layout (#2450) - @bryanyzhu
Add multimodal cheatsheet (#2467) - @sxjscience
Refactoring detection inference quickstart and bug fix on fit->predict - @yongxinw, @zhiqiangdon, @Innixma, @BingzhaoZhu, @tonyhoo
Use Pothole Dataset in Tutorial for AutoMM Detection (#2468) - @FANGAreNotGnu
add time series cheat sheet, add time series to doc titles (#2478) - @canerturkmen
Update all repo references to autogluon/autogluon (#2463) - @gidler
fix typo in object detection tutorial CI (#2516) - @tonyhoo

Bug Fixes / Security

bump evaluate to 0.3.0 (#2433) - @lvwerra
Add finetune/eval tests for AutoMM detection (#2441) - @FANGAreNotGnu
Adding Joint IA3_LoRA as efficient finetuning strategy (#2451) - @Raldir
Fix AutoMM warnings about object detection (#2458) - @zhiqiangdon
[Tabular] Speed up feature transform in tabular NN model (#2442) - @liangfu
fix matcher cpu inference bug (#2461) - @sxjscience
[timeseries] Silence GluonTS JSON warning (#2454) - @shchur
[timeseries] Fix pandas groupby bug + GluonTS index bug (#2420) - @shchur
Simplified infer speed throughput calculation (#2465) - @Innixma
[Tabular] make tabular nn dataset iterable (#2395) - @liangfu
Remove old images and dataset download scripts (#2471) - @Innixma
Support image bytearray in AutoMM (#2490) - @suzhoum
[NER] add an NER visualizer (#2500) - @cheungdaven
[Cloud] Lazy load TextPredcitor and ImagePredictor which will be deprecated (#2517) - @tonyhoo
Use detectron2 visualizer and update quickstart (#2502) - @yongxinw, @zhiqiangdon, @Innixma, @BingzhaoZhu, @tonyhoo
fix df preprocessor properties (#2512) - @zhiqiangdon
[timeseries] Fix info and fit_summary for TimeSeriesPredictor (#2510) - @shchur
[timeseries] Pass known_covariates to component models of the WeightedEnsemble - @shchur
[timeseries] Gracefully handle inconsistencies in static_features provided by user - @shchur
[security] update Pillow to >=9.3.0 (#2519) - @gradientsky
[CI] upgrade codeql v1 to v2 as v1 will be deprecated (#2528) - @tonyhoo
Upgrade scikit-learn-intelex version (#2466) - @Innixma
Save AutoGluonTabular model to the correct folder (#2530) - @shchur
support predicting with model fitted on v0.5.1 (#2531) - @liangfu
[timeseries] Implement input validation for TimeSeriesPredictor and improve debug messages - @shchur
[timeseries] Ensure that timestamps are sorted when creating a TimeSeriesDataFrame - @shchur
Add tests for preprocessing mutation (#2540) - @Innixma
Fix timezone datetime edgecase (#2538) - @Innixma, @gradientsky
Mmdet Fix Image Identifier (#2492) - @FANGAreNotGnu
[timeseries] Warn if provided data has a frequency that is not supported - @shchur
Train and inference with different image data types (#2535) - @suzhoum
Remove pycocotools (#2548) - @bryanyzhu
avoid copying identical dataframes (#2532) - @liangfu
Fix AutoMM Tokenizer (#2550) - @FANGAreNotGnu
[Tabular] Resource Allocation Fix (#2536) - @yinweisu
imodels version cap (#2557) - @yinweisu
Fix int32/int64 difference between windows and other platforms; fix mutation issue (#2558) - @gradientsky

autogluon - v0.5.3

Published by gradientsky almost 2 years ago

Version 0.5.3

v0.5.3 is a security hotfix release.

This release is non-breaking when upgrading from v0.5.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.5.2...v0.5.3

This version supports Python versions 3.7 to 3.9.

autogluon - v0.6.0

Published by gradientsky almost 2 years ago

Version 0.6.0

We're happy to announce the AutoGluon 0.6 release. 0.6 contains major enhancements to Tabular, Multimodal, and Time Series
modules, along with many quality of life improvements and fixes.

As always, only load previously trained models using the same version of AutoGluon that they were originally trained on.
Loading models trained in different versions of AutoGluon is not supported.

This release contains 263 commits from 25 contributors!

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.5.2...v0.6.0

Special thanks to @cheungdaven, @suzhoum, @BingzhaoZhu, @liangfu, @Harry-zzh, @gidler, @yongxinw, @martinschaef,
@giswqs, @Jalagarto, @geoalgo, @lujiaying and @leloykun who were first time contributors to AutoGluon this release!

Full Contributor List (ordered by # of commits):

@shchur, @yinweisu, @zhiqiangdon, @Innixma, @FANGAreNotGnu, @canerturkmen, @sxjscience, @gradientsky, @cheungdaven,
@bryanyzhu, @suzhoum, @BingzhaoZhu, @yongxinw, @tonyhoo, @liangfu, @Harry-zzh, @Raldir, @gidler, @martinschaef,
@giswqs, @Jalagarto, @geoalgo, @lujiaying, @leloykun, @yiqings

This version supports Python versions 3.7 to 3.9. This is the last release that will support Python 3.7.

Changes

AutoMM

AutoGluon Multimodal (a.k.a AutoMM) supports three new features: 1) object detection, 2) named entity recognition, and 3) multimodal matching. In addition, the HPO backend of AutoGluon Multimodal has been upgraded to ray 2.0. It also supports fine-tuning billion-scale FLAN-T5-XL model on a single AWS g4.2x-large instance with improved parameter-efficient finetuning. Starting from 0.6, we recommend using autogluon.multimodal rather than autogluon.text or autogluon.vision and added deprecation warnings.

New features

Object Detection
- Add new problem_type "object_detection".
- Customers can run inference with pretrained object detection models and train their own model with three lines of code.
- Integrate with open-mmlab/mmdetection, which supports classic detection architectures like Faster RCNN, and more efficient and performant architectures like YOLOV3 and VFNet.
- See tutorials and examples for more detail.
- Contributors and commits: @FANGAreNotGnu, @bryanyzhu, @zhiqiangdon, @yongxinw, @sxjscience, @Harry-zzh (#2025, #2061, #2131, #2181, #2196, #2215, #2244, #2265, #2290, #2311, #2312, #2337, #2349, #2353, #2360, #2362, #2365, #2380, #2381, #2391, #2393, #2400, #2419, #2421, #2063, #2104, #2411)
Named Entity Recognition
- Add new problem_type "ner".
- Customers can train models to extract named entities with three lines of code.
- The implementation supports any backbones in huggingface/transformer, including the recently FLAN-T5 series released by Google.
- See tutorials for more detail.
- Contributors and commits: @cheungdaven (#2183, #2232, #2220, #2282, #2295, #2301, #2337, #2346, #2361, #2372, #2394, #2412)
Multimodal Matching
- Add new problem_type "text_similarity", "image_similarity", "image_text_similarity".
- Users can now extract semantic embeddings with pretrained models for text-text, image-image, and text-image matching problems.
- Moreover, users can further finetune these models with relevance data.
- The semantic text embedding model can also be combined with BM25 to form a hybrid indexing solution.
- Internally, AutoGluon Multimodal implements a twin-tower architecture that is flexible in the choice of backbones for each tower. It supports image backbones in TIMM, text backbones in huggingface/transformers, and also the CLIP backbone.
- See tutorials for more detail.
- Contributors and commits: @zhiqiangdon @FANGAreNotGnu @cheungdaven @suzhoum @sxjscience @bryanyzhu (#1975, #1994, #2142, #2179, #2186, #2217, #2235, #2284, #2297, #2313, #2326, #2337, #2347, #2357, #2358, #2362, #2363, #2375, #2378, #2404, #2416, #2407, #2417)
Miscellaneous minor fixes. @cheungdaven @FANGAreNotGnu @geoalgo @zhiqiangdon (#2402, #2409, #2026, #2401, #2418)

Other Enhancements

Fix the FT-Transformer implementation and support Fastformer. @BingzhaoZhu @yiqings (#1958, #2194, #2251, #2344, #2379, #2386)
Support finetuning billion-scale FLAN-T5-XL in a single AWS g4.2x-large instance via improved parameter-efficient finetuning. See tutorial. @Raldir @sxjscience (#2032, #2108, #2285, #2336, #2352)
Upgrade multimodal HPO to use ray 2.0 and also add new tutorial. @yinweisu @suzhoum @bryanyzhu (#2206, #2341)
Further improvement on model distillation. Add example and tutorial. @FANGAreNotGnu @sxjscience (#1983, #2064, #2397)
Revise the default presets of AutoMM for image classification problems. @bryanyzhu (#2351)
Support backend=“automm” in autogluon.vision. @bryanyzhu (#2316)
Add deprecated warning to autogluon.vision and autogluon.text and point the usage to autogluon.multimodal. @bryanyzhu @sxjscience (#2268, #2315)
Examples about Kaggle: Feedback Prize prediction competition. We created a solution with AutoGluon Multimodal that obtained 152/1557 in the public leaderboard and 170/1557 in the private leaderboard, which is among the top 12% participants. The solution is public days before the DDL of the competition and obtained more than 3000 views. @suzhoum @MountPOTATO (#2129, #2168, #2333)

Improve native inference speed. @zhiqiangdon (#2051, #2157, #2161, #2171)
Other improvements, security/bug fixes. @zhiqiangdon @sxjscience @FANGAreNotGnu, @yinweisu @Innixma @tonyhoo @martinschaef @giswqs @tonyhoo (#1980, #1987, #1989, #2003, #2080, #2018, #2039, #2058, #2101, #2102, #2125, #2135, #2136, #2140, #2141, #2152, #2164, #2166, #2192, #2219, #2250, #2257, #2280, #2308, #2315, #2317, #2321, #2356, #2388, #2392, #2413, #2414, #2417, #2426, #2028, #2382, #2415, #2193, #2213, #2230)
CI improvements. @yinweisu (#1965, #1966, #1972, #1991, #2002, #2029, #2137, #2151, #2156, #2163, #2191, #2214, #2369, #2113, #2118)

Experimental Features

Support 11B-scale model finetuning with DeepSpeed. @Raldir (#2032)
Enable few-shot learning with 11B-scale model. @Raldir (#2197)
ONNX export example of hf_text model. @FANGAreNotGnu (#2149)

Tabular

New features

New experimental model FT_TRANSFORMER. @bingzhaozhu, @innixma (#2085, #2379, #2389, #2410)
- You can access it via specifying the FT_TRANSFORMER key
  in the hyperparameters dictionary or via presets="experimental_best_quality".
- It is recommended to use GPU to train this model, but CPU training is also supported.
- If given enough training time, this model generally improves the ensemble quality.
New experimental model compilation support via predictor.compile_models(). @liangfu, @innixma (#2225, #2260, #2300)
- Currently only Random Forest and Extra Trees have compilation support.
- You will need to install extra dependencies for this to work: pip install autogluon.tabular[all,skl2onnx].
- Compiling models dramatically speeds up inference time (~10x) when processing small batches of samples (<10000).
- Note that a known bug exists in the current implementation: Refitting models after compilation will fail
  and cause a crash. To avoid this, ensure that .compile_models is called only at the very end.
Added predictor.clone(...) method to allow perfectly cloning a predictor object to a new directory.
This is useful to preserve the state of a predictor prior to altering it
(such as prior to calling .save_space, .distill, .compile_models, or .refit_full. @innixma (#2071)
Added simplified num_gpus and num_cpus arguments to predictor.fit to control total resources.
@yinweisu, @innixma (#2263)
Improved stability and effectiveness of HPO functionality via various refactors regarding our usage of ray.
@yinweisu, @innixma (#1974, #1990, #2094, #2121, #2133, #2195, #2253, #2263, #2330)
Upgraded dependency versions: XGBoost 1.7, CatBoost 1.1, Scikit-learn 1.1, Pandas 1.5, Scipy 1.9, Numpy 1.23.
@innixma (#2373)
Added python version compatibility check when loading a fitted TabularPredictor.
Will now error if python versions are incompatible. @innixma (#2054)
Added fit_weighted_ensemble argument to predictor.fit. This allows the user to disable the weighted ensemble.
@innixma (#2145)
Added cascade ensemble foundation logic. @innixma (#1929)

Other Enhancements

Improved logging clarity when using infer_limit. @innixma (#2014)
Significantly improved HPO search space of XGBoost. @innixma (#2123)
Fixed HPO crashing when tuning Random Forest, Extra Trees, or KNN. @innixma (#2070)
Optimized roc_auc metric scoring speed by 7x. @innixma (#2318, #2331)
Fixed bug with AutoMM Tabular model crashing if not trained last. @innixma (#2309)
Refactored Scorer classes to be easier to use, plus added comprehensive unit tests for all metrics. @innixma (#2242)
Sped up TextSpecial feature generation during preprocessing by 20% @gidler (#2095)
imodels integration improvements @Jalagarto (#2062)
Fix crash when calling feature importance in quantile_regression. @leloykun (#1977)
Add FAQ section for missing value imputation. @innixma (#2076)
Various minor fixes and cleanup @innixma, @yinweisu, @gradientsky, @gidler (#1997, #2031, #2124, #2144, #2178, #2340, #2342, #2345, #2374, #2339,
#2348, #2403, #1981, #1982, #2234, #2233, #2243, #2269, #2288, #2307, #2367, #2019)

Time Series

New features

TimeSeriesPredictor now supports static features (a.k.a. time series metadata, static covariates) and **
time-varying covariates** (a.k.a. dynamic features or related time series). @shchur @canerturkmen (#1986, #2238,
#2276, #2287)
AutoGluon-TimeSeries now uses PyTorch by default (for DeepAR and SimpleFeedForward), removing the dependency
on MXNet. @canerturkmen (#2074, #2205, #2279)
New models! AutoGluonTabular relies on XGBoost, LightGBM and CatBoost under the hood via the autogluon.tabular
module. Naive and SeasonalNaive forecasters are simple methods that provide strong baselines with no increase in
training time. TemporalFusionTransformerMXNet brings the TFT transformer architecture to AutoGluon. @shchur (#2106,
#2188, #2258, #2266)
Up to 20x faster parallel and memory-efficient training for statistical (local) forecasting models like ETS, ARIMA
and Theta, as well as WeightedEnsemble. @shchur @canerturkmen (#2001, #2033, #2040, #2067, #2072, #2073, #2180,
#2293, #2305)
Up to 3x faster training for GluonTS models with data caching. GPU training enabled by default on PyTorch models.
@shchur (#2323)
More accurate validation for time series models with multi-window backtesting. @shchur (#2013, #2038)
TimeSeriesPredictor now handles irregularly sampled time series with ignore_index. @canerturkmen, @shchur (#1993,
#2322)
Improved and extended presets for more accurate forecasting. @shchur (#2304)
15x faster and more robust forecast evaluation with updates to TimeSeriesEvaluator @shchur (#2147, #2150)
Enabled Ray Tune backend for hyperparameter optimization of time series models. @shchur (#2167, #2203)

Miscellaneous

@shchur

Deprecate passing quantile_levels to TimeSeriesPredictor.predict (#2277)
Use static features in GluonTS forecasting models (#2238)
Make sure that time series splitter doesn't trim training series shorter than prediction_length + 1 (#2099)
Fix hyperparameter overloading in HPO for time series models (#2189)
Clean up the TimeSeriesDataFrame public API (#2105)
Fix item order in GluonTS models predictions (#2092)
Implement hash_ts_dataframe_items (#2060)
Speed up TimeSeriesDataFrame.slice_by_timestep (#2020)
Speed up RandomForestQuantileRegressor and ExtraTreesQuantileRegressor (#2204)
Various backend enhancements / refactoring / cleanup (#2314, #2294, #2292, #2278, #1985, #2398)

@canerturkmen

Increase the number of samples used by DeepAR at prediction time (#2291)
revise timeseries presets to minimum context length of 10 (#2065)
Fix timeseries daily frequency inferred period (#2100)
Various backend enhancements / refactoring / cleanup (#2286, #2302, #2240, #2093, #2098, #2044, #2385, #2355, #2405)

autogluon - v0.5.2

Published by gradientsky about 2 years ago

Version 0.5.2

v0.5.2 is a security hotfix release.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.5.1...v0.5.2

This version supports Python versions 3.7 to 3.9.

autogluon - v0.4.3

Published by gradientsky about 2 years ago

Version 0.4.3

v0.4.3 is a security hotfix release.

This release is non-breaking when upgrading from v0.4.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.4.2...v0.4.3

This version supports Python versions 3.7 to 3.9.

autogluon - v0.5.1

Published by gradientsky over 2 years ago

Version 0.5.1

We're happy to announce the AutoGluon 0.5 release. This release contains major optimizations and bug fixes to autogluon.multimodal and autogluon.timeseries modules, as well as inference speed improvements to autogluon.tabular.

This release is non-breaking when upgrading from v0.5.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains 58 commits from 14 contributors!

Full Contributor List (ordered by # of commits):

@zhiqiangdon, @yinweisu, @Innixma, @canerturkmen, @sxjscience, @bryanyzhu, @jsharpna, @gidler, @gradientsky, @Linuxdex, @muxuezi, @yiqings, @huibinshen, @FANGAreNotGnu

This version supports Python versions 3.7 to 3.9.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.5.0...v0.5.1

AutoMM

Changed to a new namespace autogluon.multimodal (AutoMM), which is a deep learning "model zoo" of model zoos. On one hand, AutoMM can automatically train deep models for unimodal (image-only, text-only or tabular-only) problems. On the other hand, AutoMM can automatically solve multimodal (any combinations of image, text, and tabular) problems by fusing multiple deep learning models. In addition, AutoMM can be used as a base model in AutoGluon Tabular and participate in the model ensemble.

New features

Supported zero-shot learning with CLIP (#1922) @zhiqiangdon
- Users can directly perform zero-shot image classification with the CLIP model. Moreover, users can extract image and text embeddings with CLIP to do image-to-text or text-to-image retrieval.
Improved efficient finetuning
- Support “bit_fit”, “norm_fit“, “lora”, “lora_bias”, “lora_norm”. In four multilingual datasets (xnli, stsb_multi_mt, paws-x, amazon_reviews_multi), “lora_bias”, which is a combination of LoRA and BitFit, achieved the best overall performance. Compared to finetuning the whole network, “lora_bias” will only finetune <0.5% of the network parameters and can achieve comparable performance on “stsb_multi_mt” (#1780, #1809). @Raldir @zhiqiangdon
- Support finetuning the mT5-XL model that has 1.7B parameters on a single NVIDIA G4 GPU. In AutoMM, we only use the T5-encoder (1.7B parameters) like Sentence-T5. (#1933) @sxjscience
Added more data augmentation techniques
- Mixup for image data. (#1730) @Linuxdex
- TrivialAugment for both image and text data. (#1792) @lzcemma
- Easy text augmentations. (#1756) @lzcemma
Enhanced teacher-student model distillation
- Support distilling the knowledge from a unimodal/multimodal teacher model to a student model. (#1670, #1895) @zhiqiangdon

Version 0.4.2

v0.4.2 is a hotfix release to fix breaking change in protobuf.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.4.1...v0.4.2

This version supports Python versions 3.7 to 3.9.

autogluon - v0.4.1

Published by gradientsky over 2 years ago

Version 0.4.1

We're happy to announce the AutoGluon 0.4.1 release. 0.4.1 contains minor enhancements to Tabular, Text, Image, and Multimodal modules, along with many quality of life improvements and fixes.

This release contains 55 commits from 10 contributors!

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.4.0...v0.4.1

Special thanks to @yiqings, @leandroimail, @huibinshen who were first time contributors to AutoGluon this release!

Full Contributor List (ordered by # of commits):

@Innixma, @zhiqiangdon, @yinweisu, @sxjscience, @yiqings, @gradientsky, @willsmithorg, @canerturkmen, @leandroimail, @huibinshen.

This version supports Python versions 3.7 to 3.9.

Changes

AutoMM

New features

Added optimization.efficient_finetune flag to support multiple efficient finetuning algorithms. (#1666) @sxjscience
- Supported options:
  - bit_fit: "BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models"
  - norm_fit: An extension of the algorithm in "Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs" and BitFit. We finetune both the parameters in the norm layers as long as the biases.

Enabled knowledge distillation for AutoMM (#1670) @zhiqiangdon

Distillation API for AutoMMPredictor reuses the .fit() function:

from autogluon.text.automm import AutoMMPredictor
teacher_predictor = AutoMMPredictor(label="label_column").fit(train_data)
student_predictor = AutoMMPredictor(label="label_column").fit(
    train_data, 
    hyperparameters=student_and_distiller_hparams, 
    teacher_predictor=teacher_predictor,
)

Option to turn on returning feature column information (#1711) @zhiqiangdon
- The feature column information is turned on for feature column distillation; for other cases it is turned off by default to reduce dataloader‘s latency.
- Added a requires_column_info flag in data processors and a utility function to turn this flag on or off.
FT-Transformer implementation for tabular data in AutoMM (#1646) @yiqings
- Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko, "Revisiting Deep Learning Models for Tabular Data" 2022. (arxiv, official implementation)
Make CLIP support multiple images per sample (#1606) @zhiqiangdon
- Added multiple images support for CLIP. Improved data loader robustness: added missing images handling to prevent training crashes.
- Added the choice of using a zero image if an image is missing.
Avoid using eos as the sep token for CLIP. (#1710) @zhiqiangdon
Update fusion transformer in AutoMM (#1712) @yiqings
- Support constant learning rate in polynomial_decay scheduler.
- Update [CLS] token in numerical/categorical transformer.
Added more image augmentations: verticalflip, colorjitter, randomaffine (#1719) @Linuxdex, @sxjscience
Added prompts for the percentage of missing images during image column detection. (#1623) @zhiqiangdon
Support average_precision in AutoMM (#1697) @sxjscience
Convert roc_auc / average_precision to log_loss for torchmetrics (#1715) @zhiqiangdon
- torchmetrics.AUROC requires both positive and negative examples are available in a mini-batch. When training a large model, the per gpu batch size is probably small, leading to an incorrect roc_auc score. Conversion from roc_auc to log_loss improves training stablility.
Added pytorch-lightning 1.6 support (#1716) @sxjscience

Checkpointing and Model Outputs Changes

Updated the names of top-k checkpoint average methods and support customizing model names for terminal input (#1668) @zhiqiangdon
- Following paper: https://arxiv.org/pdf/2203.05482.pdf to update top-k checkpoint average names: union_soup -> uniform_soup and best_soup -> best.
- Update function names (customize_config_names -> customize_model_names and verify_config_names -> verify_model_names) to make it easier to understand them.
- Support customizing model names for the terminal input.
Implemented the GreedySoup algorithm proposed in paper. Added union_soup, greedy_soup, best_soup flags and changed the default value correspondingly. (#1613) @sxjscience
Updated the standalone flag in automm.predictor.save() to save the pertained model for offline deployment (#1575) @yiqings
- An efficient implementation to save the donwloaded models from transformers for the offline deployment. Revised logic is in #1572, and discussed in #1572 (comment).
Simplified checkpoint template (#1636) @zhiqiangdon
- Stopped using pytorch lightning's model checkpoint template in saving AutoMMPredictor's final model checkpoint.
- Improved the logic of continuous training. We pass the ckpt_path argument to pytorch lightning's trainer only when resume=True.
Unified AutoMM's model output format and support customizing model names (#1643) @zhiqiangdon
- Now each model's output is dictionary with the model prefix as the first level key. The format is uniform between single model and fusion model.
- Now users can customize model names by using the internal registered names (timm_image, hf_text, clip, numerical_mlp, categorical_mlp, and fusion_mlp) as prefixes. This is helpful when users want to simultaneously use two models of the same type, e.g., hf_text. They can just use names hf_text_0 and hf_text_1.
Support standalone feature in TextPredictor (#1651) @yiqings
Fixed saving and loading tokenizers and text processors (#1656) @zhiqiangdon
- Saved pre-trained huggingface tokenizers separately from the data processors.
- This change is backwards-compatibile with checkpoints saved by verison 0.4.0.
Change load from a classmethod to staticmethod to avoid incorrect usage. (#1697) @sxjscience
Added AutoMMModelCheckpoint to avoid evaluating the models to obtain the scores (#1716) @sxjscience
- checkpoint will save the best_k_models into a yaml file so that it can be loaded later to determine the path to model checkpoints.
Extract column features from AutoMM's model outputs (#1718) @zhiqiangdon
- Add one util function to extract column features for both image and text.
- Support extracting column features for models timm_image, hf_text, and clip.
Make AutoMM dataloader return feature column information (#1710) @zhiqiangdon

Bug fixes

Fixed calling save_pretrained_configs in AutoMMPrediction.save(standalone=True) when no fusion model exists (here) (#1651) @yiqings
Fixed error raising for setting key that does not exist in the configuration (#1613) @sxjscience
Fixed warning message about bf16. (#1625) @sxjscience
Fixed the corner case of calculating the gradient accumulation step (#1633) @sxjscience
Fixes for top-k averaging in the multi-gpu setting (#1707) @zhiqiangdon

Tabular

Limited RF max_leaf_nodes to 15000 (previously uncapped) (#1717) @Innixma
- Previously, for very large datasets RF/XT memory and disk usage would quickly become unreasonable. This ensures that at a certain point RF and XT will no longer become larger given more rows of training data. Benchmark results showed that the change is an improvement, particularly for the high_quality preset.
Limit KNN to 32 CPUs to avoid OpenBLAS error (#1722) @Innixma
- Issue #1020. When training K-nearest-neighbors (KNN) models, sometimes a rare error can occur that crashes the entire process:
```
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault: 11
```
This error occurred when the machine had many CPU cores (>64 vCPUs) due to too many threads being created at once. By limiting to 32 cores used, the error is avoided.
Improved memory warning thresholds (#1626) @Innixma
Added get_results and model_base_kwargs (#1618) @Innixma
- Added get_results to searchers, useful for debugging and for future extensions to HPO functionality.
  Added new way to init a BaggedEnsembleModel that avoids having to init the base model prior to initing the bagged ensemble model.
Update resource logic in models (#1689) @Innixma
- Previous implementation would crash if user specified auto for resources, fixed in this PR.
- Added get_minimum_resources to explicitly define minimum resource requirements within a method.
Updated feature importance default subsample_size 1000 -> 5000, num_shuffle_sets 3 -> 5 (#1708) @Innixma
- This will improve the quality of the feature importance values by default, especially the 99% confidence bounds. The change increases the time taken by ~8x, but this is acceptable because of the numerous inference speed optimizations done since these defaults were first introduced.
Added notice to ensure serializable custom metrics (#1705) @Innixma

Bug fixes

Fixed evaluate when weight_evaluation=True (#1612) @Innixma
- Previously, AutoGluon would crash if the user specified predictor.evaluate(...) or predictor.evaluate_predictions(...) when self.weight_evaluation==True.
Fixed RuntimeError: dictionary changed size during iteration (#1684, #1685) @leandroimail
Fixed CatBoost custom metric & F1 support (#1690) @Innixma
Fixed HPO not working for bagged models if the bagged model is loaded from disk (#1702) @Innixma
Fixed Feature importance erroring if self.model_best is None (can happen if no Weighted Ensemble is fit) (#1702) @Innixma

Documentation

updated the text tutorial of cutomizing hyperparameters (#1620) @zhiqiangdon
- Added customizeable backbones from the Huggingface model zoo and how to use local backbones.
Improved implementations and docstrings of save_pretrained_models and convert_checkpoint_name. (#1656) @zhiqiangdon
Added cheat sheet to website (#1605) @yinweisu
Doc fix to use correct predictor when calling leaderboard (#1652) @Innixma

Miscellaneous changes

[security] updated pillow to 9.0.1+ (#1615) @gradientsky
[security] updated ray to 1.10.0+ (#1616) @yinweisu
Tabular regression tests improvements (#1555) @willsmithorg
- Regression testing of model list and scores in tabular on small synthetic datasets (for speed).
- Tests about 20 different calls to TabularPredictor on both regression and classification tasks, multiple presets etc.
- When a test fails it dumps out the config change required to make it pass, for ease of updating.
Disabled image/text predictor when gpu is not available in TabularPredictor (#1676) @yinweisu
- Resources are validated before bagging is started. Image/text predictor model would require minimum of 1 gpu.
Use class property to set keys in model classes. In this way, if we customize the prefix key, other keys are automatically updated. (#1669) @zhiqiangdon

Various bugfixes, documentation and CI improvements

@yinweisu (#1605, #1611, #1631, #1638, #1691)
@zhiqiangdon (#1721)
@Innixma (#1608, #1701)
@sxjscience (#1714)

autogluon - v0.4.0

Published by Innixma over 2 years ago

We're happy to announce the AutoGluon 0.4 release. 0.4 contains major enhancements to Tabular and Text modules, along with many quality of life improvements and fixes.

This release is non-breaking when upgrading from v0.3.1. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains 151 commits from 14 contributors!

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.3.1...v0.4.0

Special thanks to @zhiqiangdon, @willsmithorg, @DolanTheMFWizard, @truebluejason, @killerSwitch, and @Xilorole who were first time contributors to AutoGluon this release!

Full Contributor List (ordered by # of commits):

@Innixma, @yinweisu, @gradientsky, @zhiqiangdon, @jwmueller, @willsmithorg, @sxjscience, @DolanTheMFWizard, @truebluejason, @taesup-aws, @Xilorole, @mseeger, @killerSwitch, @rschmucker

This version supports Python versions 3.7 to 3.9.

Bugs in v0.4

#1607 pip install autogluon.text will error on import if installed standalone due to missing autogluon.features as a dependency. To fix: pip install autogluon.features. This will be resolved in v0.4.1 release.

Changes

General

AutoGluon now supports Windows OS! Both CPU and GPU are supported on Windows.
AutoGluon now supports Python 3.9. Python 3.6 is no longer supported.
AutoGluon has migrated from MXNet to PyTorch for all deep learning models resulting in major speedups.
AutoGluon v0.4 Cheat Sheet: Get started faster than ever before with this handy reference page!
New tutorials showcasing cloud training and deployment with AWS SageMaker and Lambda.

Text

AutoGluon-Text is refactored with PyTorch Lightning. It now supports backbones in huggingface/transformers. The new version has better performance, faster training time, and faster inference speed. In addition, AutoGluon-Text now supports solving multilingual problems and a new AutoMMPredictor has been implemented for automatically building multimodal DL models.

Better Performance
- Compared with TextPredictor in AutoGluon 0.3, TextPredictor in AutoGluon 0.4 has 72.22% win-rate in the multimodal text-tabular benchmark published in NeurIPS 2021. If we use presets="high_quality", the win-rate increased to 77.8% thanks to the DeBERTa-v3 backbone.
- In addition, we resubmitted our results to MachineHack: Product Sentiment Analysis, "MachineHack: Predict the Price of Books", and "Kaggle: Mercari Price Suggestion". With three lines of code, AutoGluon 0.4 is able to achieve top places in these competitions (1st, 2nd, 2nd correspondingly). The results obtained by AutoGluon 0.4 also consistently outperform the results obtained by AutoGluon 0.3.
Faster Speed
- The new version has ~2.88x speedup in training and ~1.40x speedup in inference. With g4dn.12x instance, the model can achieve an additional 2.26x speedup with 4 GPUs.
Multilingual Support
- AutoGluon-Text now supports solving multilingual problems via cross-lingual transfer (Tutorial). This is triggered by setting presets="multilingual". You can now train a model on the English dataset and directly apply the model on datasets in other languages such as German, Japanese, Italian, etc.
AutoMMPredictor for Multimodal Problems
- Support an experimental AutoMMPredictor that supports fusion image backbones in timm, text backbone in huggingface/transformers, and multimodal backbones like CLIP (Tutorial). It may perform better than ensembling ImagePredictor + TextPredictor.
Other Features
- Support continuous training from an existing checkpoint. You may just call .fit() again after a previous trained model has been loaded.

Thanks to @zhiqiangdon and @sxjscience for contributing the AutoGluon-Text refactors! (#1537, #1547, #1557, #1565, #1571, #1574, #1578, #1579, #1581, #1585, #1586)

Tabular

AutoGluon-Tabular has been majorly enhanced by numerous optimizations in 0.4. In summation, these improvements have led to a:

~2x training speedup in Good, High, and Best quality presets.
~1.3x inference speedup.
63% win-rate vs AutoGluon 0.3.1 (Results from AutoMLBenchmark)
- 93% win-rate vs AutoGluon 0.3.1 on datasets with >=100,000 rows of data (!!!)

Specific updates:

Added infer_limit and infer_limit_batch_size as new fit-time constraints (Tutorial). This allows users to specify
the desired end-to-end inference latency of the final model and AutoGluon will automatically train models
to satisfy the constraint. This is extremely useful for online-inference scenarios where you need to satisfy an
end-to-end latency constraint (for example 50ms). @Innixma (#1541, #1584)
Implemented automated semi-supervised and transductive learning in TabularPredictor.
Try it out via TabularPredictor.fit_pseudolabel(...)! @DolanTheMFWizard (#1323, #1382)
Implemented automated feature pruning (i.e. feature selection) in TabularPredictor.
Try it out via TabularPredictor.fit(..., feature_prune_kwargs={})! @truebluejason (#1274, #1305)
Implemented automated model calibration to improve AutoGluon's predicted probabilities for classification problems.
This is enabled by default, and can be toggled via the calibrate fit argument. @DolanTheMFWizard (#1336, #1374, #1502)
Implemented parallel bag training via Ray. This results in a ~2x training speedup when bagging is enabled
compared to v0.3.1 with the same hardware due to more efficient usage of resources
for models that cannot effectively use all cores. @yinweisu (#1329, #1415, #1417, #1423)
Added adaptive early stopping logic which greatly improves the quality of models within a time budget. @Innixma (#1380)
Added automated model calibration in quantile regression. @taesup-aws (#1388)
Enhanced datetime feature handling. @willsmithorg (#1446)
Added support for custom confidence levels in feature importance. @jwmueller (#1328)
Improved neural network HPO search spaces. @jwmueller (#1346)
Optimized one-hot encoding preprocessing. @Innixma (#1376)
Refactored refit_full logic to majorly simplify user model contributions and improve multimodal support with advanced presets. @Innixma (#1567)
Added experimental TabularPredictor config helper. @gradientsky (#1491)
New Tutorials
- GPU training tutorial for tabular models. @gradientsky (#1527)
- Feature preprocessing tutorial. @willsmithorg (#1478)

Tabular Models

NEW: TabularNeuralNetTorchModel (alias: 'NN_TORCH')

As part of the migration from MXNet to Torch, we have created a Torch based counterpart
to the prior MXNet tabular neural network model. This model has several major advantages, such as:

1.9x faster training speed
4.7x faster inference speed
51% win-rate vs MXNet Tabular NN

This model has replaced the MXNet tabular neural network model in the default hyperparameters configuration,
and is enabled by default.

Thanks to @jwmueller and @Innixma for contributing TabularNeuralNetTorchModel to AutoGluon! (#1489)

NEW: VowpalWabbitModel (alias: 'VW')

VowpalWabbit has been added as a new model in AutoGluon. VowpalWabbit is not installed by default, and must be installed separately.
VowpalWabbit is used in the hyperparameters='multimodal' preset, and the model is a great option to use for datasets containing text features.

To install VowpalWabbit, specify it via pip install autogluon.tabular[all, vowpalwabbit] or pip install "vowpalwabbit>=8.10,<8.11"

Thanks to @killerSwitch for contributing VowpalWabbitModel to AutoGluon! (#1422)

XGBoostModel (alias: 'XGB')

Optimized model serialization method, which results in 5.5x faster inference speed and halved disk usage. @Innixma (#1509)
Adaptive early stopping logic leading to 54.7% win-rate vs prior implementation. @Innixma (#1380)
Optimized training speed with expensive metrics such as F1 by ~10x. @Innixma (#1344)
Optimized num_cpus default to equal physical cores rather than virtual cores. @Innixma (#1467)

CatBoostModel (alias: 'CAT')

CatBoost now incorporates callbacks which make it more stable and resilient to memory errors,
along with more advanced adaptive early stopping logic that leads to 63.2% win-rate vs prior implementation. @Innixma (#1352, #1380)

LightGBMModel (alias: 'GBM')

Optimized training speed with expensive metrics such as F1 by ~10x. @Innixma (#1344)
Adaptive early stopping logic leading to 51.1% win-rate vs prior implementation. @Innixma (#1380)
Optimized num_cpus default to equal physical cores rather than virtual cores. @Innixma (#1467)

FastAIModel (alias: 'FASTAI')

Added adaptive batch size selection and epoch selection. @gradientsky (#1409)
Enabled HPO support in FastAI (previously HPO was not supported for FastAI). @Innixma (#1408)
Made FastAI training deterministic (it is now consistently seeded). @Innixma (#1419)
Fixed GPU specification in FastAI to respect the num_gpus parameter. @Innixma (#1421)
Forced correct number of threads during fit and inference to avoid issues with global thread updates. @yinweisu (#1535)

LinearModel (alias: 'LR')

Linear models have been accelerated by 20x in training and 20x in inference thanks to a variety of optimizations.
To get the accelerated training speeds, please install scikit-learn-intelex via pip install "scikit-learn-intelex>=2021.5,<2021.6"

Note that currently LinearModel is not enabled by default in AutoGluon,
and must be specified in hyperparameters via the key 'LR'.
Further testing is planned to incorporate LinearModel as a default model in future releases.

Thanks to the scikit-learn-intelex team and @Innixma for the LinearModel optimizations! (#1378)

Vision

Refactored backend logic to be more robust. @yinweisu (#1427)
Added support for inference via CPU. Previously, inferring without GPU would error. @yinweisu (#1533)
Refactored HPO logic. @Innixma (#1511)

Miscellaneous

AutoGluon no longer depends on ConfigSpace, cython, dill, paramiko, autograd, openml, d8, and graphviz.
This greatly simplifies installation of AutoGluon, particularly on Windows.
Entirely refactored HPO logic to break dependencies on ConfigSpace and improve stability and ease of development.
HPO has been simplified to use random search in this release while we work on
re-introducing the more advanced HPO methods such as bayesopt in a future release.
Additionally, removed 40,000 lines of out-dated code to streamline future development.
@Innixma (#1397, #1411, #1414, #1431, #1443, #1511)
Added autogluon.common to simplify dependency management for future submodules. @Innixma (#1386)
Removed autogluon.mxnet and autogluon.extra submodules as part of code cleanup. @Innixma (#1397, #1411, #1414)
Refactored logging to avoid interfering with other packages. @yinweisu (#1403)
Fixed logging output on Kaggle, previously no logs would be displayed while fitting AutoGluon in a Kaggle kernel. @Innixma (#1468)
Added platform tests for Linux, MacOS, and Windows. @yinweisu (#1464, #1506, #1513)
Added ROADMAP.md to highlight past, present, and future feature prioritization and progress to the community. @Innixma (#1420)
Various documentation and CI improvements
- @jwmueller (#1379, #1408, #1429)
- @gradientsky (#1383, #1387, #1471, #1500)
- @yinweisu (#1441, #1482, #1566, #1580)
- @willsmithorg (#1476, #1483)
- @Xilorole (#1526)
- @Innixma (#1452, #1453, #1528, #1577, #1584, #1588, #1593)
Various backend enhancements / refactoring / cleanup
- @DolanTheMFWizard (#1319)
- @gradientsky (#1320, #1366, #1385, #1448, #1488, #1490, #1570, #1576)
- @mseeger (#1349)
- @yinweisu (#1497, #1503, #1512, #1563, #1573)
- @willsmithorg (#1525, #1543)
- @Innixma (#1311, #1313, #1327, #1331, #1338, #1345, #1369, #1377, #1380, #1408, #1410, #1412, #1419, #1425, #1428, #1462, #1465, #1562, #1569, #1591, #1593)
Various bug fixes
- @jwmueller (#1314, #1356)
- @yinweisu (#1472, #1499, #1504, #1508, #1516)
- @gradientsky (#1514)
- @Innixma (#1304, #1325, #1326, #1337, #1365, #1395, #1405, #1587, #1599)

autogluon - v0.3.1

Published by Innixma about 3 years ago

v0.3.1 is a hotfix release which fixes several major bugs as well as including several model quality improvements.

This release is non-breaking when upgrading from v0.3.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains 9 commits from 4 contributors.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.3.0...v0.3.1

Thanks to the 4 contributors that contributed to the v0.3.1 release!

Special thanks to @yinweisu who is a first time contributor to AutoGluon and fixed a major bug in ImagePredictor HPO!

Full Contributor List (ordered by # of commits):

@Innixma, @gradientsky, @yinweisu, @sackoh

Changes

Tabular

AutoGluon v0.3.1 has a 58% win-rate vs AutoGluon v0.3.0 for best_quality preset.
AutoGluon v0.3.1 has a 75% win-rate vs AutoGluon v0.3.0 for high and good quality presets.
Fixed major bug introduced in v0.3.0 with models trained in refit_full causing weighted ensembles to incorrectly weight models. This severely impacted accuracy and caused worse results for high and good quality presets. @Innixma (#1293)
Removed KNN from stacker models, resulting in stack quality improvement. @Innixma (#1294)
Added automatic detection and optimized usage of boolean features. @Innixma (#1286)
Improved handling of time limit in FastAI NN model to avoid edge cases where the model would use the entire time budget but fail to train. @Innixma (#1284)
Updated XGBoost to use -1 as n_jobs value instead of using os.cpu_count(). @sackoh (#1289)

Vision

Fixed major bug that caused HPO with time limits specified to return very poor models. @yinweisu (#1282)

General

Minor doc updates. @gradientsky (#1288, #1290)

autogluon - v0.3.0

Published by Innixma about 3 years ago

v0.3.0 introduces multi-modal image, text, tabular support to AutoGluon. In just a few lines of code, you can train a multi-layer stack ensemble using text, image, and tabular data! To our knowledge this is the first publicly available implementation of a model that handles all 3 modalities at once. Check it out in our brand new multimodal tutorial! v0.3.0 also features a major model quality improvement for Tabular, with a 57.6% winrate vs v0.2.0 on the AutoMLBenchmark, along with an up to 10x online inference speedup due to low level numpy and pandas optimizations throughout the codebase! This inference optimization enables AutoGluon to have sub 30 millisecond end-to-end latency for real-time deployment scenarios when paired with model distillation. Finally, AutoGluon can now train PyTorch image models via integration with TIMM. Specify any TIMM model to ImagePredictor or TabularPredictor to train them with AutoGluon!

This release is non-breaking when upgrading from v0.2.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains 70 commits from 10 contributors.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.2.0...v0.3.0

Thanks to the 10 contributors that contributed to the v0.3.0 release!

Special thanks to the 3 first-time contributors! @rxjx, @sallypannn, @sarahyurick

Special thanks to @talhaanwarch who opened 21 GitHub issues (!) and participated in numerous discussions during v0.3.0 development. His feedback was incredibly valuable when diagnosing issues and improving the user experience throughout AutoGluon!

Full Contributor List (ordered by # of commits):

@Innixma, @zhreshold, @jwmueller, @gradientsky, @sxjscience, @ValerioPerrone, @taesup-aws, @sallypannn, @rxjx, @sarahyurick

Major Changes

Multimodal

Added multimodal tabular, text, image functionality! See the tutorial to get started. @innixma, @zhreshold (#1041, #1211, #1277)

Tutorials

Added a new custom model tutorial to showcase how to easily add any model to AutoGluon! @Innixma (#1238)
Added a new custom metric tutorial to showcase how to add custom metrics to AutoGluon! @Innixma (#1271)
Added FairHPO tutorial. @ValerioPerrone (#1090, #1236)

Tabular

Overall, AutoGluon-Tabular v0.3 wins 57.6% of the time against AutoGluon-Tabular v0.2 in AutoMLBenchmark!
Improved online inference speed by 1.5x-10x via various low level pandas and numpy optimizations. @Innixma (#1136)
Accelerated feature preprocessing speed by 100x+ for datetime and text features. @Innixma (#1203)
Fixed FastAI model not properly scaling regression label values, improving model quality significantly. @Innixma (#1162)
Fixed r2 metric having the wrong sign in FastAI model, dramatically improving performance when r2 metric is specified. @Innixma (#1159)
Updated XGBoost to 1.4, defaulted hyperparameter tree_method='hist' for improved performance. @Innixma (#1239)
Added groups parameter. Now users can specify the exact split indices in a groups column when performing model bagging. This solution leverages sklearn's LeaveOneGroupOut cross-validator. @Innixma (#1224)
Added option to use holdout data for final ensembling weights in multi-layer stacking via a new use_bag_holdout argument. @Innixma (#1105)
Added neural network based quantile regression models. @taesup-aws (#1047)
Bug fix for random forest models' out-of-fold prediction computation in quantile regression. @jwmueller, @Innixma (#1100, #1102)
Added predictor.features() to get the original feature names used during training. @Innixma (#1257)
Refactored AbstractModel code to be easier to use. @Innixma (#1151, #1216, #1245, #1266)
Refactored BaggedEnsembleModel code in preparation for distributed bagging. @gradientsky (#1078)
Updated RAPIDS version to 21.06. @sarahyurick (#1241)
Force dtype conversion in feature preprocessing to align with FeatureMetadata. Now users can specify the dtypes of features via FeatureMetadata rather than updating the DataFrame. @Innixma (#1212)
Fixed various edge cases with out-of-bounds date time values. Now out-of-bounds date time values are treated as missing. @Innixma (#1182)

Vision

Added Torch / TIMM backend support! Now AutoGluon can train any TIMM model natively, and MXNet is no longer required to train vision models. @zhreshold (#1249)
Added regression problem_type support to ImagePredictor. @sallypannn (#1165)
Added GPU memory check to avoid going OOM during training. @Innixma (#1199)
Fixed error when vision models are hyperparameter tuned with forked multiprocessing. @gradientsky (#1107)
Fixed crash when an image is missing (both train and inference). Use TabularPredictor's Image API to get this functionality. @Innixma (#1210)
Fixed error when the same image is in multiple rows when calling predict_proba. @Innixma (#1206)
Fixed invalid preset configurations. @Innixma (#1199)
Fixed major defect causing tuning data to not be properly created if tuning data was not provided by user. @Innixma (#1168)
Upgraded Pillow version to '>=8.3.0,<8.4.0'. @gradientsky (#1262)

Text

Removed pyarrow as a required dependency. @Innixma (#1200)
Fixed crash when eval_metric='average_precision'. @rxjx (#1092)

General

Improved support for GPU on Windows. @Innixma (#1255)
Added quadratic kappa evaluation metric. @sxjscience (#1104)
Improved access method for __version__. @Innixma (#1122)
Upgraded pandas to 1.3. @Innixma (#1258)
Upgraded ConfigSpace to 0.4.19. @Innixma (#1265)
Upgraded numpy, graphviz, and dill versions. @Innixma (#1275)
Various minor doc improvements. @jwmueller, @Innixma (#1089, #1091, #1093, #1095, #1219, #1253)
Various minor updates and fixes. @Innixma, @zhreshold, @gradientsky (#1098, #1099, #1101, #1113, #1117, #1118, #1166, #1177, #1188, #1197, #1227, #1229, #1235, #1245, #1251)