

🌐 Lexicon-NLP-Lab

Welcome to the Lexicon! This repository contains a comprehensive collection of Jupyter notebooks and datasets focused on various Natural Language Processing (NLP) tasks.

πŸ“‚ Repository Structure

πŸ” Data Preprocessing

  • 1_Regex_for_information_extraction.ipynb - Regular expressions for information extraction.
  • 2_Spacy_vs_Nltk.ipynb - Comparison between Spacy and NLTK for tokenization.
  • 3_Spacy_Tokenize.ipynb - Tokenization techniques using Spacy.
  • 4_Spacy_Pipelines.ipynb - Pipelines in Spacy: Stemming and Lemmatization.
  • 5_Stemming_Lemmatization.ipynb - Stemming and lemmatization methods.
  • 5_Stemming_Lemmatization_2.ipynb - Continuation of stemming, lemmatization, and POS tagging.
  • 6_Parts_of_Speech_2.ipynb - POS tagging, Bag of Words, and NER with Spacy.
  • 6_Parts_of_Speech_in_Spacy.ipynb - Detailed POS tagging with Spacy.

🏷️ Named Entity Recognition (NER)

  • 7_NER.ipynb - Named entity recognition with Spacy.
  • 7_NER_2.ipynb - Additional NER tasks and implementations.

πŸ—ƒοΈ Bag of Words and N-Grams

  • 8_Bag_of_Words_2_SentimentAnalysis.ipynb - Sentiment analysis using Bag of Words.
  • 8_Bag_of_Words_SpamClassifier.ipynb - Spam classification with Bag of Words.
  • 9_Stop_Words.ipynb - Handling stop words in text preprocessing.
  • 9_Stop_Words_2.ipynb - Further exploration of stop words, Bag of Words, and N-grams.
  • 10_Bag_of_N_Grams_2_Fake_News_Prediction.ipynb - Fake news prediction using N-grams.
  • 10_Bag_of_N_Grams_News_Classification.ipynb - News classification with N-grams.

πŸ”€ TF-IDF (Term Frequency-Inverse Document Frequency)

  • 11_TF_IDF_2_EmotionDetection.ipynb - Emotion detection using TF-IDF.
  • 11_TF_IDF_TextClassification_Ecommerce_Goods.ipynb - E-commerce goods classification using TF-IDF.

πŸ’‘ Word Embeddings and Vectors

  • 12_Overview_Spacy_Word_Vectors.ipynb - Overview of word vectors using Spacy and Gensim.
  • 13_Spacy_Word_Embeddings_News_Category_Classification.ipynb - News category classification using Spacy word embeddings.
  • 14_Nlp_Word_Vectors_Gensim_Overview.ipynb - Overview of word vectors using Gensim.
  • 15_Gensim_w2v_Google_Fake_News_Detection.ipynb - Fake news detection with Gensim.

πŸš€ FastText Classifier

  • 16_Fasttext_Indian_Food_Receipe_Classification.ipynb - Classification of Indian food recipes using FastText.
  • 17_Fasttext_Ecommerce_Classification.ipynb - E-commerce classification using FastText.

πŸ”§ Miscellaneous

  • cosine_similarity.ipynb - Computing cosine similarity between text vectors.

πŸ“Š Datasets

  • Cleaned_Indian_Food_Dataset.csv - Dataset for Indian food recipes classification.
  • Fake_Real_Data.csv - Dataset containing fake and real news.
  • news_story.txt - Text file with a sample news story.
  • spam.csv - Spam dataset for classification tasks.
  • students.txt - Additional text file for experimentation.