In my applied statistics course at the Ruhr West University of Applied Sciences held by Prof. Dr. Eimler, I build a tool chain to extract the results of a study which looked into the connection between personality traits and emoji usage. The study was presented at the GOR Congress in cologne.
Research Question: Do the ‘recently used emojis’ on your smartphone predict personality traits or vice versa?
Unlike common psychology studies, a bottom-up process was used, which means instead of having a clear perspective about the possible outcome and comparing statistical results with the first formulated hypothesis we first assess data from the participants and try to build a conclusion afterward.
An example of the ‘recently used emojis’ dock from WhatsApp on Android
Basically, we wanted to get a clue about the emoji usage by assessing ‘recently used emojis’ WhatsApp screenshots and the participant personality by letting them answer personality questions on psychological scales (Big Five Inventory, Happiness and Life Satisfaction, Impression Motivation and demographics).
Possible questions were…
Some of the latent variables we aimed for were happiness and life satisfaction, emotional stability or openness to new experiences.
An elementary difference to existing methods of measuring emoji usage was using actual real screenshots instead of letting participants reproduce real-world behaviour.
Possible Outcomes were…
Fun Bonus: Make an oracle which tells you facts about your personality by uploading your own ‘recently used emojis’ screenshot.
Let’s say we have 300 participants with around 35 emojis in every screenshot. Which makes around 10.000 emojis in total. That’s a lot of emojis to make mistakes on while identifying them by hand.
Out of that problem a chain of tools was build including a python script which identified the emojis from the screenshots, a UI for reviewing the extracted metadata and a web scatter plot visualising connections between certain personality traits:
The purpose of the Emoji Identifier is to cut out the emojis from the screenshots to make a Euclidean distance matching against all 2.7k emojis. To speed up the process the emojis were scaled down to the dimensions of 24x24 pixels. The identification process then took around 2 minutes per screenshot. The identifier is written in Python 3 using OpenCV.
The identifier outputs JSON (and saves it to the database) including…
Example Output:
{
"recognitions": [{
"key": "face-with-tears-of-joy_1f602",
"hex": "1f602",
"name": "face-with-tears-of-joy"
}, {
"key": "smiling-face-with-heart-shaped-eyes_1f60d",
"hex": "1f60d",
"name": "smiling-face-with-heart-shaped-eyes"
// … 33 more items
}, ],
"suggestions": [
[{
"key": "face-with-tears-of-joy_1f602",
"hex": "1f602",
"name": "face-with-tears-of-joy"
}, {
"key": "full-moon-symbol_1f315",
"hex": "1f315",
"name": "full-moon-symbol"
},
// … 33 more items
],
],
"meta": {
"rows": 5,
"columns": 7,
"emoji_type": "apple",
"runtime": 166
}
}
We prelimited the screenshots to WhatsApp, whereas iPhones use the native keyboard emojis and Android got its own so-called WhatsApp emojis. We agreed on the fact that the difference is not significant and we’d interpreting them the same way.
The purpose of the Review UI was to make it easier for the team to fix incorrect recognition from the identifier.
We saved the JSON data in a MongoDB together with the uploaded screenshot, so we had a single place to edit the resulting identification.
This it how it looks like to fix incorrect recognition, on the left the original screenshot and on the right the recognition itself.
To get a better understanding of the data, we used a scatter chart to visualize emojis in relation to the personality traits. For that D3.js Charts were used.
🎉 Demo: You can play around with the chart here.
It seems like an obvious approach to show off the results of the study in a more entertaining and interactive way because people get engaged more when you make an estimation about them.
Sample output:
You are female. You did not have a crush. You are the life of the party. You feel the feelings of others. You are slightly annoyed. You get jobs done right away. You have excellent ideas.
The estimations are calculated by the mean of every assessed personality trait per emoji. The value of each personality trait is sorted into buckets. Then, a random phrase gets picked for its direction.
Example:
We have a JSON list of positive / negative effect phrases (from Wikipedia) for every personality trait.
{
"NEUROTICISM": {
"positive": ["You are easily irritated.", "You are easily stressed.", "You are slightly annoyed.", "You often have mood swings.", "You are worried about things.", "You are much more anxious than most people."],
"negative": [
"You are relaxed most of the time.",
"You rarely feel depressed."
]
},
// …
}
The buckets look like this…
That means if a participant had a neuroticism value of 3.8 the output could be: You are worried about things.
This is the congress poster we submitted to the Global Online Research committee. It got an 8 out of 10 rating by two international peer-to-peer reviewers.