bixin

Chinese Sentiment Analysis 中文文本情感分析

Downloads
270
Stars
178
Committers
2

bixin

Chinese Sentiment Analysis base on dictionary and rules.

CHANGELOG

prior to v0.0.4, bixin depends on cppjieba-py, which requires a c++ 11 compillation makes hard to use, I decided to use jieba_fast.

it will solve the following problems:

  • hard to install the dependency cppjieba-py
  • can't load user dictionary
  • word segment difference from jieba

but it slower than use cppjieba-py

Installation

> pip3 install bixin

Usage

    from bixin import predict
    text =""
    # 
    predict(text)
    # sentiment score: 0.42

sentiment score is in the range of -1 to 1

predict will load dictionary data at first time,to load it manually use predict.classifier.initialize()

Accuracy

Test with 6226 taged corpus mixed up with shopping reviews Sina Weibo tweets hotel reviews news and financial news

accuracy: 0.827771

Notice:neutral texts are all ignored.

details about test dataset see wiki

Development

> pip3 install -e ".[dev]" git+https://github.com/bung87/bixin

./dictionaries dictionaries from vary sources
./data processed dictionaries through ./scripts/tagger.py
./scripts/release_data.py release data to package

./scripts/score.py

all data archives: https://github.com/bung87/bixin/releases/tag/v0.0.1

run accuray testing with all .txt files under test_data directory sentence per line end with a space and a tag n or p

Test

nosetests -c nose.cfg for single python version tox for multiple python versions

Acknowledgments

bixin was inspired by dongyuanxin's DictEmotionAlgorithm

Support me

License

MIT bung