Lightning implementation of seq2seq dialog model
src
__init__.py
data.py
inference.py
lightning.py
model.py
tokenizer.py
Makefile
README.md
archi.png
get_dataset.py
requirements.txt
train.py
make install-requirements
- ;make install-apex
- apex mixed precision;make get-amazon
- Amazon QA, bpe, ;make collect-amazon
- OpenSubtitles, bpe, ;make get-opensubtitles
- Amazon QA ;make collect-opensubtitles
- OpenSubtitles ;make train-amazon
- Amazon QA, ;make train-opensubtitles
- OpenSubtitles, .qa 38- . .
:
. .
get_dataset.py
. :
bpe;
Amazon QA: <SEP>
OpenSubtitles: <CTX> <CTX> <CTX> <SEP>
Amazon QA , OpenSubtitles 3 - .
. OpenSubtitles
get_dataset.py
.
Sequence bucketing , : . , , , , OOM. , , , , . . : 33% GPT ( Compute). , : .
```src/data/BatchingStrategy``` dynamic batching: sequence bucketing,
. ,
64 64 , 8 512 ( ). .
GPT Improving Language Understanding by Generative Pre-Training.
: Unified Language Model Pre-training for Natural Language Understanding and Generation 3- . S2. , attention , self-attention source . , .
GPT , ,
, Seq2SeqLM attention/self-attention .
, ,
seq2seq
causal
. seq2seq
.
: 0.1 , 0.9 , 40000 ( ).
train.py
#train
--model_type, type=str, default=seq2seq
--data_source, type=str, default=amazon
--data_dir, type=str, default=./data/amazon
--checkpoint_path, type=str, default=./data/amazon/checkpoint
--project_name, type=str, default=LightningConversation
--max_norm, type=float, default=2.5
--distributed_backend, type=str, default=ddp
--gpus, type=int, default=1 if torch.cuda.is_available() else 0
--n_grad_accumulate, type=int, default=1
--batching_type, type=str, default=db
--num_workers, type=int, default=1
--batch_size, type=int, default=64
--max_length, type=int, default=64
--seed, type=int, default=42
--seq2seq_min_prob, type=float, default=0.1
--seq2seq_max_prob, type=float, default=0.9
--min_training_steps, type=int, default=40000
# model
--model_dim, type=int, default=768
--num_heads, type=int, default=12
--feed_forward_dim, type=int, default=3072
--num_layers, type=int, default=12
--response_segment_index, type=int, default=1
--query_segment_index, type=int, default=2
--context_segment_index, type=int, default=3
--weight_tying, action=store_true
--n_positions, type=int, default=65
--dropout, type=float, default=0.1
--initializer_range, type=float, default=0.02
# loss
--criterion, type=str, default=label_smoothing
--smoothing, type=float, default=0.1
--use_kl, action=store_true
# optimizers & schedulers
--optimizer, type=str, default=adam
--learning_rate, type=float, default=0.001
--weight_decay, type=float, default=0.
--momentum, type=float, default=0.9
--nesterov, action=store_true
--warmup_steps, type=int, default=4000
--lr_scheduler, type=str, default=none
get_dataset.py
--data_source, type=str, required=True
--data_dir, type=str, required=True
--sep_token, type=str, default=<SEP>
--context_token, type=str, default=None
--max_n_context, type=int, default=3
--max_train_samples, type=int, default=int(1.e+7)
--n_bpe_train_samples, type=int, default=int(1.e+7)
--verbose, action=store_true
--download, action=store_true
--train_bpe, action=store_true
--collect_data, action=store_true
--chunk_size, type=int, default=int(1.5e+6)
--min_validation_size, type=int, default=100000
--validation_prob, type=float, default=0.1
--min_chars, type=int, default=25
--max_chars, type=int, default=512
--min_tokens, type=int, default=10
--min_tokens_query, type=int, default=3
--min_tokens_response, type=int, default=3
--max_tokens, type=int, default=128
--max_unknowns, type=int, default=3
--vocab_size, type=int, default=32000
--bpe_coverage, type=float, default=0.999
MMI, ;