Skip to main content

WhatsApp Chat Analysis

In this post, we will go through a process to analyze the WhatsApp chat using python. We will use the data from the WhatApp chat text file of a particular chat for analysis. Here, we are going to analyze the WhatsApp chat of a group named "Jujube Enterprises". 

Obtain .txt data file from WhatsApp:

Open WhatsApp and go to the chat. Export the chat without media.
Exported chat from the Group "Jujube Enterprises" is a text file with .txt extension which holds the chat information. The file renamed as "jujube.txt" has data in the format:
jujube.txt

Required Libraries:

Data Frame generation from jujube.txt:

We will generate dataframe df using the data from jujube.txt file. The data in the file is to be extracted and used in a way that we can get a data frame with columns: Date, Time, User and Message.  

Now we have the data frame, lets check for the unique users in the chat. 
There are few messages that has None value as User, we need to remove the None.
We have four unique users :
Thunder, Jujube CEO, Jujube CTO, Jujube Founder.
The messages can contain media, links, emojis. We can extract the message type and their count.
As we exported whatsapp chat data without media. The media message has been replaced by "<Media omitted>". The messages deleted by you has been replaced by a message "You deleted this message", we will filter this out too. Let's separate media messages from text messages.  Lets obtain two data frames media_messages_df and text_messages_df. The text_messages_df may contain emojis and links. Let's take a look at the text_messages_df and have a look at the letter and word count in messages:
....

To an interesting part, lets take a look at the text message distribution respect to word count for each user, so that we can determine which user uses most words in messages.

Stats of each user:

Let's take look at the stats of each use such as total messages sent, count of type of content of messages, etc. 
In similar manner, we can get the media % shared by the users. For code, You can scroll down to the code section where you can find the notebook.
One of the trend is the use of emojis by the users to display their response. It will be very interesting to identify the emojis that are most used and least used in the chat.

Line chart to determine the activity in the chat:

A line chart is a very useful and interesting way to visualize the data over time, simply a display of information over series of data points. We will visualize count of messages vs date. It is very useful to determine the activity in the chat over period.

Active Years

The active years and their activity in terms of messages per year.

Active days of the week:

Let's take a look at the activity on particular days of the week in terms of messages on a particular day.

Active Hours:

CONCLUSION:

After Analzing the WhatsApp data of the group Jujube Enterprises, this can be concluded that there are 4 users in the group namely 'Thunder','Jujube Founder','Jujube CTO' and 'Jujube CEO'. The total number of messages by users in the group are: 3053 out of which there are 548 messages contain Media, 1066 contain Emojis and 67 contain Links. There have been found 95 emoji types in the group messages. 2019 has been found as the most active year with maximum message sent by users to the group. Saturday has been found as the day of the week with most messages sent to the group and the most Active hours are 2100 hours to 2300 hours.

Code:

For notebook CLICK HERE

Popular posts from this blog

Coding Problem: Sober Walk

Our hoary culture had several great persons since time immemorial and king vikramaditya’s nava ratnas(nine gems) belongs to this ilk. They are named in the following shloka: Among these, Varahamihira was an astrologer of eminence and his book Brihat Jataak is recokened as the ultimate authority in astrology. He was once talking with Amarasimha,another gem among the nava ratnas and the author of Sanskrit thesaurus, Amarakosha. Amarasimha wanted to know the final position of a person, who starts from the origin 0 0 and travels per following scheme.     He first turns and travels 10 units of distance     His second turn is upward for 20 units     Third turn is to the left for 30 units     Fourth turn is the downward for 40 units     Fifth turn is to the right(again) for 50 units … And thus he travels, every time increasing the travel distance by 10 units. Code:

Image to Pencil Sketch using Python and OpenCV

In this post, we will go through a program to get a pencil sketch from an image using python and OpenCV.  Step 1:  To use OpenCV, import the library. Step 2: Read the Image. Step 3: Create a new image by converting the original image to grayscale. Step 4: Invert the grayscale image. We can invert images simply by subtracting from 255, as grayscale images are 8 bit images or have a maximum of 256 tones. Step 5: Blur the inverted image using GaussianBlur  method in OpenCV library and invert the blurred image.  Step 6: Divide the grayscale values of the image by the values of image received from step-5 ( Note: We inverted the grayscale image and we blurred this image and then again inverted it ). Diving an image from its smoothened form will highlight the edges and we get the image like Pencil Sketch. Steps Illustration: Code: Execution Output: