bixin

Chinese Sentiment Analysis base on dictionary and rules.

CHANGELOG

prior to v0.0.4, bixin depends on cppjieba-py, which requires a c++ 11 compillation makes hard to use, I decided to use jieba_fast.

it will solve the following problems:

hard to install the dependency cppjieba-py
can't load user dictionary
word segment difference from jieba

but it slower than use cppjieba-py

Installation

> pip3 install bixin

Usage

    from bixin import predict
    text =""
    # 
    predict(text)
    # sentiment score: 0.42

sentiment score is in the range of -1 to 1

predict will load dictionary data at first time,to load it manually use predict.classifier.initialize()

Accuracy

Test with 6226 taged corpus mixed up with shopping reviews Sina Weibo tweets hotel reviews news and financial news

accuracy: 0.827771

Notice:neutral texts are all ignored.

details about test dataset see wiki

Development

> pip3 install -e ".[dev]" git+https://github.com/bung87/bixin

./dictionaries dictionaries from vary sources
./data processed dictionaries through ./scripts/tagger.py
./scripts/release_data.py release data to package

./scripts/score.py

all data archives: https://github.com/bung87/bixin/releases/tag/v0.0.1

run accuray testing with all .txt files under test_data directory sentence per line end with a space and a tag n or p

Test

nosetests -c nose.cfg for single python version tox for multiple python versions

Acknowledgments

bixin was inspired by dongyuanxin's DictEmotionAlgorithm

Support me

License

MIT bung

Package Rankings

Top 16.12% on Pypi.org

Badges

Extracted from project README

Related Projects

codebleu

Pip compatible CodeBLEU metric implementation available for linux/macos/win

23 Jun 2023 61

python-packages-for-data-geeks

A curated list of useful Python packages for data geeks

03 May 2019 21

Bamboo

Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.

12 Mar 2022 167

multi-criteria-cws

Simple Solution for Multi-Criteria Chinese Word Segmentation

05 Dec 2017 300

botty-bot-bot-bot

Personable chatbot for Slack using the Slack Realtime Messaging API.

22 May 2015 10

fast-aug

Fast Augmentation library for NLP

17 Dec 2023 1

oseti

Dictionary based Sentiment Analysis for Japanese

11 Feb 2019 91

MT-SFT-ShareGPT

18 Aug 2024 3

LexicalRichness

A module to compute textual lexical richness (aka lexical diversity).

09 May 2018 90

nlpcommon

NLP common tools.

28 Dec 2021 5

pnlp

NLP预/后处理工具。

18 Apr 2019 29

pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，B...

28 Apr 2017 482

HarvestText

文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法

19 Nov 2018 2,391