NLP-Notebooks

This repository contains notebooks showcasing various Natural Language Processing (NLP) tasks implemented using Python and popular NLP libraries such as NLTK, SpaCy, and scikit-learn. The notebooks cover a wide range of NLP tasks including tokenization, normalization (stemming and lemmatization), bags of words, named entity recognition (NER), part-of-speech (POS) tagging, different encoding techniques, word embedding using Word2Vec and GloVe, and TF-IDF (Term Frequency-Inverse Document Frequency).

Notebooks

Tokenization : Notebook demonstrating tokenization techniques using NLTK and SpaCy.
Stemming : Implemented stemming techniques with NLTK and SpaCy in Python
Lemmatization : Explored lemmatization methods in Python using NLTK and SpaCy
Named Entity Recognition : Performed Named Entity Recognition (NER) using NLTK and SpaCy in Python. Understand how to identify and extract named entities such as person names, organization names, locations, etc.
Part-of-Speech Tagging : Implemented POS tagging techniques with NLTK and SpaCy in Python. Learn how to assign grammatical categories to words in a text corpus, such as noun, verb, adjective, etc.
Stopwords : Demonstrated stopwords removal techniques using NLTK and SpaCy in Python. Understand how to filter out common words that do not carry significant meaning in text analysis tasks.

Encoding Techniques -
- One Hot Encoding : Performed OHE on text documents into binary vectors, demonstrated using NLTK and SpaCy in Python.
- Bag of Words : Represented text documents as vectors based on word frequency, using NLTK and SpaCy in Python.
- TF-IDF : Assigns scores to words in documents based on their frequency (term frequency) and rarity (inverse document frequency), using NLTK and SpaCy in Python.
Word Embedding -
- Word2Vec : Implementated of Word2Vec in Python using both pretrained and scratch-built models.
- Avg Word2Vec : Utilization of average Word2Vec embeddings in Python, demonstrating efficient word embedding techniques for natural language processing tasks.
- GloVe : Utilized Stanford's pre-trained GloVe model for efficient word embedding in natural language processing tasks.
- FastText : Leveraged Gensim and the FastText library for effective text representation and classification using subword information and Skipgram architecture.