Regression-Datasets

🎛️ A collection of diverse regression datasets, featuring PyTorch-like dataset classes that automatically download and load datasets.

MIT License

Downloads
163
Stars
0

This repository offers a diverse collection of regression datasets across vision, audio and text domains. It provides dataset classes that follow the PyTorch Datasets structure, allowing users to automatically download and load these datasets with ease. All datasets come with a permissive license, permitting their use for research purposes.

1. Installation

To install the regsets package, you can use pip:

python -m pip install regsets

Alternatively, you can download a specific dataset file (e.g., utkface.py) and include it in your project to load the dataset locally.

2. Usage

Below are examples of how to use the regsets package for loading datasets.

Vision Datasets

from regsets.vision import UTKFace

utkface_trainset = UTKFace(root="./data", split="train", download=True)

for image, label in utkface_trainset:
    ...

Audio Datasets

from regsets.audio import VCC2018

vcc2018_trainset = VCC2018(root="./data", split="train", download=True)

for audio, sample_rate, label in vcc2018_trainset:
    ...

Text Datasets

from regsets.text import Amazon_Review

amazon_review_trainset = Amazon_Review(root="./data", split="train", download=True)

for texts, label in amazon_review_trainset:
    (ori, aug_0, aug_1) = texts
    ...

3. Datasets

For datasets that do not provide a predefined train-test split, I randomly sample 80% of the data for training and reserve the remaining 20% for testing. Details for each dataset are provided below.

Vision Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
UTKFace 18,964 - 4,741 [1, 116]

Audio Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
BVCC 4,974 1,066 1,066 [1, 5]
VCC2018 16,464 - 4,116 [1, 5]

Text Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
Amazon Review 250,000 25,000 650,000 [0, 4]
Yelp Review 250,000 25,000 50,000 [0, 4]

4. License

Distributed under the MIT License. See LICENSE for more information.

5. Contact

6. Acknowledgments

Package Rankings
Top 33.75% on Pypi.org
Related Projects