text-vector-visualisation

Website: https://rohetoric.github.io/text-vector-visualisation/

APACHE-2.0 License

Stars
0
Committers
4

Exploration & Visualisation of FastText Word Vectors Using TensorFlow 1 and 2

Requirements and Dependencies

To run the code the following are a must to be installed:

Serial No Libraries to Install
1. FastText
2. TensorFlow
3. Spacy

Steps to Execute

  1. Download the bbc-text.csv dataset from here​ or it can be downloaded through the terminal if gcloud is already setup by the command gsutil cp gs:​//​dataset-uploader​/bbc/​bbc-text.csv [path to notebook directory]

  2. Make sure all the libraries are present/updated according to the requirements and dependencies mentioned above.

  3. To train the model according to the above complete dataset using FastText, run the notebook fasttextmodeltrain.ipynb present in _notebooks folder. A pre-trained model (2.4GB size) based on the dataset can be downloaded from here.

According to the FastText documentation:

Steps 4,5 and 6 differ for TF1 and TF 2. After that, the steps are same.


To Visualise Embeddings Using TF1 [NOT ADVISED]

  1. Create a folder called tb1files in the same directory of the notebooks​ and keep it empty. It will store all the tensorflow log files after step 5 is run.

  2. Run the notebook ​tb1vis.ipynb present in _notebooks folder​.

  3. Set the terminal address path to the directory where the files are stored in the terminal and type the command: tensorboard ​ --logdir tb1files/

The above command would yield a result:


To Visualise Embeddings Using TF2 [ADVISED]

  1. Create a folder called tb2files in the same directory of the notebooks​ and keep it empty. It will store all the tensorflow log files after step 5 is run.

  2. Run the notebook ​tb2vis.ipynb​ present in _notebooks folder​.

  3. Set the terminal address path to the directory where the files are stored in the terminal and type the command: tensorboard ​ --logdir tb2files/

The above command would yield a result:

  1. Open the local host URL link present in the last line. For Example: http://localhost:6008/​ [in TB1 Command image].

  2. The local host website shown below will run. From the drop-down which reads Inactive, press and go to Projector as depicted by the arrow in the image below.

  1. This will plot the words according to their embedding values shown in the 3D graph of tensorboard. The nearest neighbours of a word can be found by typing the word in the search bar, as done for the example ‘plea’ shown below.

That's it, folks!