Analyze WhatsApp chat
The script reads an exported WhatsApp chat and then extracts the data. You may need to install some packages before running it.
pip install -r requirements.txt
$ git clone https://github.com/PetengDedet/WhatsApp-Analyzer.git
$ cd WhatsApp-Analyzer
$ python whatsapp_analyzer.py chat_example.txt --stopword indonesian
usage: python whatsapp_analyzer.py FILE [-h] [-d] [-s] [-c]
Read and analyze whatsapp chat
positional arguments:
FILE Chat file path
optional arguments:
-h, --help show this help message and exit
-d, --debug Debug mode. Shows details for every parsed line.
-s , --stopword Stop Words: A stop word is a commonly used word (such
as 'the', 'a', 'an', 'in'). In order to get insightful
most common word mentioned in the chat, we need to
skip these type of word. The Allowed values are:
arabic, bulgarian, catalan, czech, danish, dutch,
english, finnish, french, german, hebrew, hindi,
hungarian, indonesian, italian, malaysian, norwegian,
polish, portuguese, romanian, russian, slovak,
spanish, swedish, turkish, ukrainian, vietnamese
-c , --customstopword
Custom Stop Words. File path to stop word. File must a
raw text. One word for every line
I've included stop words for several languages from https://github.com/Alir3z4/stop-words.
You can use your own stop word file.
Just use -c
argument followed by filepath.
One word for each file like below
able
ableabout
about
above
abroad
abst
"14/10/18, 11:16 - Contact Name: this is a message"
"2/30/18, 2:07 AM - Contact Name: Test👌"
"[30/12/18 4.59.25 PM] Nama User: 🙏test"
"[06/07/17 13.23.30] +62 123-456-78910: image omitted"
Describe how the script identify and classify the chat
+------------------+
+----+ Empty line? +----+
| +------------------+ |
| |
| |
+---v---+ +----v---+
| Yes | +-----------------+ No |
+-------+ | +---+----+
| |
+---------+-+ +-----v-----+
| Event Log | +----+ Chat +----+
+-----------+ | +-----------+ |
| |
+------v-----+ +-----v------+ +--------------------+
+-----+Regular Chat+----+ | Attachment +-->+ Clasify Attachment |
| +------------+ | +------------+ +-------+------------+
v v |
+---------+---------+ +---------+----------+ |
| Starting Line | | Following Line | |
+------+------------+ +-+------------------+ |
| | |
| | |
| +------v-------+ |
| | COUNTER | |
| | 1 Chat | |
+---------->+ 2 Timestamp +<----------------------------+
| 3 Sender |
| 4 Domain |
| 5 Words |
| 6 Attachment |
| 7 Emoji |
+-----+--------+
|
|
|
v
+----------+----------------+
| Visualize |
+---------------------------+