A bottom-up, incremental refactor of the example scripts (the main ones at least) that will pave the way for more cool stuff to come in the future.
Timeline: around ~2 weeks
Also inspired by @srush's simple training document here (Note: hf-internal)
# Clone the repo, then install torch and transformers
pip install transformers torch
datasets_glue.py
and datasets_lm.py
) are copy/paste of existing code.
torch.data.Dataset
s but we will plug @thomwolf's framework agnostic datasets there.data_processor.py
) (name is T.B.D.): they possess one or more Dataset
strainer.py
file contains:
TrainingArgs
(read the docstring there)Trainer
class (currently very partial, see docstring there.)run_glue.py
and run_language_modeling.py
@srush has refactored a few of the examples into lightning and there seems to be a fair number of users who are using lightning and contributing other examples.
We do not want to support only lightning examples though, because
TFTrainer
, or even just support TF transparently, like in Pipeline
s).In all cases, this current proposal will also make the pytorch-lightning examples much cleaner and more compact, because most of the heavy lifting is going to be in the DataProcessor
s and the TrainingArgs
above (and the datasets) and will be shared.