Sentence Reconstruction using Transformer Model
MIT License
This project is a comprehensive exploration of the application of deep learning techniques to the problem of sentence reconstruction. The task is to reconstruct the original sequence of a given English sentence from a random permutation of its words. This is a challenging problem due to the inherent complexity of language and the numerous possible permutations of a sentence.
The project is constrained by several factors:
The project is implemented in Python, using:
The project includes extensive preprocessing of the dataset. This involves tokenizing the sentences, encoding the words as integers, and creating the random permutations. A custom text vectorization layer is created for this purpose.
The model itself is a Transformer, a type of neural network that uses self-attention mechanisms to capture the dependencies between words in a sentence. The model has an encoder-decoder architecture, where the encoder processes the input sentence and the decoder generates the reconstructed sentence.
The Transformer model architecture used in this project includes:
The training process involves the following steps:
The model is evaluated using a specific metric that measures the accuracy of the reconstructed sentences. This metric finds the longest common subsequence between the original and reconstructed sentences and calculates the ratio of this length to the length of the original sentence. A higher ratio indicates a better reconstruction. The model is tested on a separate test set to assess its performance. The evaluation process involves:
The proposed Transformer model demonstrated good performance in reconstructing sentences. It outperformed Seq2Seq models with LSTM encoders and decoders in capturing long-term dependencies and syntactic structures.
The Transformer model is tested on a set of 3,000 randomly selected instances. The results are promising, with the model achieving an average score of approximately 0.51 with a standard deviation of 0.28. This indicates that the model is able to reconstruct the original sentence with reasonable accuracy.
The modelโs architecture, which has less than 10 million parameters, is discussed in detail. The impact of various hyperparameters, such as the number of layers and the embedding dimensions, on the modelโs size and performance is analyzed.
The Transformer model shows promise in the task of sentence reconstruction. It is able to capture long-term dependencies and syntactic structures in the sentences, outperforming previous LSTM-based Seq2Seq models. This is achieved within the constraints of the projectโs parameter limit.
The project concludes by selecting the model with fewer parameters as the more challenging and interesting solution. This model strikes a balance between performance and efficiency, and represents a promising direction for future work.