Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn: Tokenization, Normalization, NER, POS tagging, Encoding, Word embedding.
MIT License
This repository contains notebooks showcasing various Natural Language Processing (NLP) tasks implemented using Python and popular NLP libraries such as NLTK, SpaCy, and scikit-learn. The notebooks cover a wide range of NLP tasks including tokenization, normalization (stemming and lemmatization), bags of words, named entity recognition (NER), part-of-speech (POS) tagging, different encoding techniques, word embedding using Word2Vec and GloVe, and TF-IDF (Term Frequency-Inverse Document Frequency).
Tokenization : Notebook demonstrating tokenization techniques using NLTK and SpaCy.
Stemming : Implemented stemming techniques with NLTK and SpaCy in Python
Lemmatization : Explored lemmatization methods in Python using NLTK and SpaCy
Named Entity Recognition : Performed Named Entity Recognition (NER) using NLTK and SpaCy in Python. Understand how to identify and extract named entities such as person names, organization names, locations, etc.
Part-of-Speech Tagging : Implemented POS tagging techniques with NLTK and SpaCy in Python. Learn how to assign grammatical categories to words in a text corpus, such as noun, verb, adjective, etc.
Stopwords : Demonstrated stopwords removal techniques using NLTK and SpaCy in Python. Understand how to filter out common words that do not carry significant meaning in text analysis tasks.
Encoding Techniques -
Word Embedding -
This project is licensed under the MIT License - see the LICENSE file for details.