underthesea

Underthesea - Vietnamese NLP Toolkit

GPL-3.0 License

Downloads
34.7K
Stars
1.4K
Committers
14

Bot releases are hidden (Show)

underthesea - Underthesea 1.1.7

Published by rain1024 over 6 years ago

✨ Major Features and Improvements

  • API CHANGE: Change word_sent function to word_tokenize

🔴 Bug fixes

  • Fix dependencies hell (#174)

🗎 Documentation and examples

  • Add Vietnamese README page README.vi.rst
  • Update style in README.rst page

🔊 Release Notes

The main focus in this release is fix dependencies hell error which is reported by @dthphuong and @YannDubs. This fix will enhance speed in installation process of underthesea and remove all unnecessary dependencies in underthesea by default.

Another import update is an API change. We rename word_sent function to word_tokenize which is a better name for word segmentation task.

Contributors

Thanks to @rain1024, @JackNhat for the contributions!

underthesea - Underthesea 1.1.6

Published by rain1024 almost 7 years ago

✨ Major Features and Improvements

  • NEW: Implement a Vietnamese aspect sentiment analysis in banking social data.
  • NEW: Improve languageflow project with new models (KimCNNCLassifier, XGBoostClassifier), develop LanguageBoard to visualize and inspect features and trained models.

🔴 Bug fixes

  • Fix bug tokenize string with "=" (#159)

🗎 Documentation and examples

🔊 Release Notes

The main feature in this release is aspect sentiment analysis. We conduct a banch of experiments with social posts data in bank domain. Traditional classifiers such as SVM, Naive Bayes, Gradient Boosting Tree with count features and tfidf features still yield the better result (59.5% in f1 score), compare with deep learning models like fasttext and CNN. You can view live demo of Vietnamese aspect sentiment analysis in underthesea service

We rename underthesea-flow project to languageflow, integrate new models (KimCNNCLassifier, XGBoostClassifier). See more detail in languageflow documentation

Contributors

Thanks to @rain1024, @JackNhat for the contributions!

underthesea - Underthesea 1.1.5

Published by rain1024 about 7 years ago

✨ Major Features and Improvements

  • NEW: Implement a Vietnamese named entity recognition using CRF #90
  • NEW: Create new projects underthesea-flow for NLP experiments, underthesea.amrbank to create a Vietnamese AMR Bank.
  • One line install is back, only download model and data on demand.

🔴 Bug fixes

  • Refactor underthesea.word_sent, underthesea.pos_tag, underthesea.chunking projects

🗎 Documentation and examples

🔊 Release Notes

The main feature in this release is named entity recognition. Our experiments focus on conditional random fields models, which yield a reasonable result and fast (~20 mins per experiment). For more information about NER experiments, go to its own repository.
A lot of work in this month to improve our pipeline, a new project underthesea-flow is created for this reason.
We also create a new project underthesea.amr in response to the raise of AMR. Our first goal is create first 3000 Vietnamese annotated sentences in our AMR bank.

👥 Contributors

Thanks to @rain1024, @JackNhat, @vunb for the contributions!

underthesea - Underthesea 1.1.4

Published by rain1024 about 7 years ago

✨ Major Features and Improvements

  • NEW: Implement a Vietnamese text classification using fasttext #118

🔴 Bug fixes

  • Fix issue in Text wrapper function

🗎 Documentation and examples

🔊 Release Notes

The main feature in this release is text classification. We experiments some standard classifiers (Naive Bayes, SVM family, xgboost) and a trendy classifier fasttext in very large Vietnamse news data set (30k sentences). The winner is fasttext because it's very fast and yeild best accuracy and f1 score. For more information about classification experiments, follow the this link to its own repository.

We're afraid that we can't support one line install due to many dependencies come with v1.1.4 (fasttext, sklearn). Other reason is we want to separate models and code. So after install underthesea, you must do a small step is download models. Check out how to make underthesea works with four lines in Installation section here.

See you next release!

underthesea - Underthesea 1.1.3

Published by rain1024 about 7 years ago

✨ Major Features and Improvements

underthesea - Underthesea 1.1.0

Published by rain1024 over 7 years ago

Word Segmentation, POS Tagging, Chunking
Support python 2 only

Package Rankings
Top 2.54% on Pypi.org
Related Projects