Chinese Sentiment Analysis base on dictionary and rules.
prior to v0.0.4, bixin depends on cppjieba-py
, which requires a c++ 11 compillation makes hard to use, I decided to use jieba_fast
.
it will solve the following problems:
cppjieba-py
jieba
but it slower than use cppjieba-py
> pip3 install bixin
from bixin import predict
text =""
#
predict(text)
# sentiment score: 0.42
sentiment score is in the range of -1 to 1
predict
will load dictionary data at first time,to load it manually use predict.classifier.initialize()
Test with 6226 taged corpus mixed up with shopping reviews Sina Weibo tweets hotel reviews news and financial news
accuracy: 0.827771
Notice:neutral texts are all ignored.
details about test dataset see wiki
> pip3 install -e ".[dev]" git+https://github.com/bung87/bixin
./dictionaries dictionaries from vary sources
./data processed dictionaries through ./scripts/tagger.py
./scripts/release_data.py release data to package
./scripts/score.py
all data archives: https://github.com/bung87/bixin/releases/tag/v0.0.1
run accuray testing with all .txt files under test_data directory sentence per line end with a space and a tag n or p
nosetests -c nose.cfg
for single python version
tox
for multiple python versions
bixin was inspired by dongyuanxin's DictEmotionAlgorithm
MIT bung