Using natural language processing to analyze the sentiments of people and detect suicidal ideation on online social content.
MIT License
The rise of social media and online communities creates safe and anonymous spaces for individuals to share their thoughts about their mental health and express their feelings and sufferings in online communities. To prevent suicide, it is necessary to detect suicide-related posts and user's suicide ideation in cyberspace by natural language processing methods. I focused on the online community called Reddit and the social networking website Twitter, and classify user's posts with potential suicide and without suicidal risk through text features processing, machine learning, and deep learning based methods.
Collected two sets of data from Reddit and Twitter. The Reddit data set includes (2958) suicidal ideation samples and a number of non-suicide texts (5381). The Twitter dataset has a total (3000) tweets with suicidal ideation. Reddit Data was scraped from subreddits like 'suicide watch', 'depression', 'anxiety' etc. Twitter data was collected by querying keywords like 'end my life', 'die' etc.
The Twitter word cloud (left) and Reddit word cloud (right) are shown as follow:
Results of different methods applied
Model | Acc. | Pre. | Rec. | F1 |
---|---|---|---|---|
RF + TFIDF | 0.96 | 0.96 | 0.96 | 0.96 |
LSTM + GLOBE | 0.97 | 0.97 | 0.97 | 0.97 |
Dataset
: All the collected and cleaned datasetData_Collection
: Code for scraping data from reddit and twitterSrc
: All The source code for text preprocessing and building ml modelsPretrained_Models
: All the Pretrained Models and tokenizersFlask
: Code for server and model deploymentcd Flask
python app.py
Distributed under the MIT License. See LICENSE
for more information.