Can tweets infer personality differences between iPhone and Android Users.
To begin answering this question, I first of all created an app on twitter: https://apps.twitter.com/app/new
Once made, in the application settings, I clicked where it said “manage keys and access tokens”. On this new page you can generate a consumer key and consumer secret, and you can also generate access tokens, and access token secrets. I needed these four codes during twitter authentication when retrieving tweets so they are important to write down. In the code below, you will need to add in your own authentification codes as I have removed mine for security reasons. Once this is done, the only thing you’ll need to change is the save destination of your files. The rest should run without editing as long as you have installed the required packages.
Then in R Studio, I used the following code to get tweets:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
#Removes everything from workspace rm(list=ls()) #Required packages library(twitteR) library(tidyr) library(dplyr) library(purrr) #twitter authentification consumerKey <- "YOUR-CONSUMER-KEY" consumerSecret <- "CONSUMER-SECRET" accessToken <- "YOUR-ACCESS-TOKEN" accessTokenSecret <- "YOUR-ACCESS-TOKEN-SECRET" setup_twitter_oauth(consumerKey, consumerSecret, accessToken, accessTokenSecret) #searches twitter in line with desired search criteria and puts results in a list. This function requires the twitteR package twitterlist <- searchTwitter(" ", n=10000, lang="en", since=NULL, until=NULL, locale=NULL, geocode=NULL, sinceID=NULL, maxID=NULL, resultType="recent", retryOnRateLimit=50) |
I then cleaned the tweets by removing re-tweets, removing duplicate users and removing repeated tweets. It was important to ensure that I did not have two tweets from the same user to get an independent sample.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
#removes retweets from twitter list minusretweetstwitterlist <- strip_retweets(twitterlist, strip_manual = TRUE, strip_mt = TRUE) #converts list into easy to read data frame twitterlist_df <- tbl_df(map_df(minusretweetstwitterlist, as.data.frame)) #analyses whether there are any duplicated twitter user names. This is so we get an independent sample duplicateuserscheck <- duplicated(twitterlist_df$screenName, incomparables = FALSE) #adds a duplicateuserscheck column to the twitter list data frame to indicate where the duplicate users are. TRUE indicates where the duplicates are. twitter_df_duplicateusers <- cbind(twitterlist_df, duplicateuserscheck) # Removes user duplicates by removing rows whereby duplication was TRUE. withoutduplicateusers <- subset(twitter_df_duplicateusers, subset = duplicateuserscheck == FALSE) #analyses whether there are any duplicated tweets duplicatetweetscheck <- duplicated(withoutduplicateusers$text, incomparables = FALSE) #adds a duplicatetweetscheck column to the withoutduplicateusers data frame to indicate where the duplicate tweets are. TRUE indicates where the duplicates are. twitter_df_duplicatetweets <- cbind(withoutduplicateusers, duplicatetweetscheck) # Removes tweet duplicates by removing rows whereby duplication was TRUE. withoutANYduplicates <- subset(twitter_df_duplicatetweets, subset = duplicatetweetscheck == FALSE) |
Finally, I wanted to save this data, and have separate files for iPhone and Android tweets.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
#Makes a dataframe that only contains tweets from iPhone or Android phones, and displays tweet content and time/date of tweet. tweets <- withoutANYduplicates %>% select(id, statusSource, text, created) %>% extract(statusSource, "source", "Twitter for (.*?)<") %>% filter(source %in% c("iPhone", "Android")) #saves twitter data as a csv file write.csv(tweets, file = "/Users/heathershaw/Desktop/extractedtweets/data.csv",row.names=FALSE) # creates a dataframe that only contains iPhone Data IphoneDF <- withoutANYduplicates %>% select(id, statusSource, text, created) %>% extract(statusSource, "source", "Twitter for (.*?)<") %>% filter(source %in% c("iPhone")) #creates a dataframe that only contains Android Data AndroidDF <- withoutANYduplicates %>% select(id, statusSource, text, created) %>% extract(statusSource, "source", "Twitter for (.*?)<") %>% filter(source %in% c("Android")) #saves both these files write.csv(IphoneDF, file = "/Users/heathershaw/Desktop/extractedtweets/iPhonedata.csv",row.names=FALSE) write.csv(AndroidDF, file = "/Users/heathershaw/Desktop/extractedtweets/Androiddata.csv",row.names=FALSE) |
You should be left with three files. The data file containing both iPhone and Android tweets looks like this:
To analyse this text, I used the LIWC 2015 software to asses the psychological properties of iPhone and Android users tweets.
Because the twitter data is saved as a CSV file, we can individually analyse every person (row) of data. We don’t need to copy and paste the tweets in a word file to get an overall score for each dimension. This means every individual will have their own LIWC scores on each of its dimensions. This was useful for statistical analysis, which I will discuss further down in this post.
The output of the LIWC analysis looks like this once saved as an excel file:
To make this final CSV file ready for importing back into R, I replaced the headers Source (A), Source (B), Source (C), Source (D), with the titles underneath (id, source, text and created) and then deleted the empty row.
I then repeated these LIWC steps for the iPhone and Android data.
In particular I was interested in measuring whether Android and iPhone users differed on four summary variables. They reflect a 100 point scale ranging from 0 to 100. The algorithms behind these scores are not available due to the authors prior commercial agreements. The variables include:
○ Analytical thinking – a high number reflects formal, logical, and hierarchical thinking; lower numbers reflect more informal, personal, here and now, narrative thinking.
○ Clout – a high number suggests that the author is speaking from the perspective of high expertise and is confident; low Clout numbers suggest a more tentative, humble, even anxious style.
○ Authentic – higher numbers are associated with a more honest, personal, and disclosing text; lower numbers suggest a more guarded, distanced form of discourse.
○ Emotional tone – a high number is associated with a more positive, upbeat style; a low number reveals greater anxiety, sadness, or hostility. A number around 50 suggests either a lack of emotionality or different levels of ambivalence.
All of these summary variables refer to writing styles, which are used to infer personality traits. When reading the associated papers:
The analytical thinking score was developed from data which measured 50,000 admission essays from a large state university across the years 2004-2007. They found that higher grades were associated with greater article (a, an, the) and preposition (to, above) use. Lower grades were associated with greater use of auxiliary verbs (is, have) , personal pronouns (I, her, they) , impersonal pronouns (it, thing), adverbs (so, really, very) conjunctions (and, but) and negotiations (no, never). This was used to develop the categorical – dynamic index (CDI) using principle component analysis. This is a bipolar scale because the more students used articles and prepositions, the less they used pronouns and other functions words. One one end is categorical language “which combines heightened abstract thinking (associated with greater article use) and cognitive complexity, (associated with greater use of prepositions). A lower CDI involves greater use of auxiliary verbs, adverbs, conjunctions, impersonal pronouns, negations, and personal pronouns. These word categories, particularly pronouns and auxiliary verbs have been associated with more time-based stories, and reflect a dynamic or narrative language style.” They later found that higher CDI scores were associated with higher academic performance when measuring GPA (grade point average).
Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. PLoS ONE, 9(12), 1–10. http://doi.org/10.1371/journal.pone.0115844
The clout score was developed from 5 studies which measured language differences between people of different ranks or status. In study 1, a leader was randomly assigned to a group through a bogus leadership questionnaire and for 30 minutes the group had to agree on a series of decisions. In study 2, participants worked in pairs to solve complex problems over instant messenger and were asked to self-report perceived power through the questions “To what degree did you control the conversation” and “to what degree did you have power in the conversation”. In study 3 people talked face to face about everyday topics and similar to study 1, the conversational transcripts were transcribed, and they were asked to rate their self-perceived power in the same way as study 2. Study 4 measured emails between participants and their correspondents, and rated their own status relative to each of their chosen correspondents using the scale 1 – other has much lower status and 7 = other has much higher status. Study 5 measured military letters between soldiers of the iraqi military associated with Saddam Hussein’s regime. They found that overall, pronoun use reflects position in social hierarchy. First person singular pronouns (I, me) were associated with lower status and suggest more self-attention. First person plurals (we, us) and second person singular pronouns (you, your) were used more by those with higher status.
Kacewicz, E., Pennebaker, J. W., Davis, M., Jeon, M., & Graesser, A. C. (2014). Pronoun Use Reflects Standings in Social Hierarchies. Journal of Language and Social Psychology, 33(2), 125–143. http://doi.org/10.1177/0261927X13502654
The Authentic score was developed across 5 studies which compared the linguistic properties of false stories and true stories. In study 1, participants were taped discussion both true and false views on abortion, and were asked to be as believable as possible. In study 2, participants were asked to type both true and false views on abortion, and were encouraged to be as persuasive as possible. Study 3 hand wrote both true and false views on abortion and again were asked to be as truthful and deceptive as possible. In study 4, participants were asked to provide verbal true and false descriptions about people they truly liked and disliked and were again asked to be honest and convincing. Study 5 was a mock crime scenario whereby half the participants were asked to look around a room and the other half were told to steal a dollar bill. All participants were accused of taking the money and were told to deny this accusation. They were told that if the interrogator was convinced of their innocence, they would get the dollar bill. Across all the studies, liars used first-person singular pronouns (I, my, me) at a lower rate than truth tellers. Secondly, liars used negative words (hate, worthless, enemy) in greater amounts than truth-tellers. Third, liars used fewer exclusive words (but, except, without) normally associated with cognitive complexity and are used to make reference to what is in a given category and what is not. Liars use third person pronouns (he, she, they) at a lower rate. The research found LIWC’s classified liars and truth tellers at a rate of 67% accuracy.
Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: predicting deception from linguistic cues. Personality and Social Psychology Bulletin, 29(5), 665–675. http://doi.org/10.1177/0146167203251529
The emotional tone score was developed by analysing the diaries of 1084 online journal users for a period of 4 months, two weeks before and two weeks after the September 11th attacks in 2001. When analysing positive and negative emotional words, the September 11 attacks reduced positivity on average by 1.36 standard deviations, and this increased over the next week monotonically until it returned to baseline. This bipolar emotional positivity scale was calculated as the difference between LIWC scores for emotion words (happy, good, nice) and negative emotion words (kill ugly, guilty). Higher scores = greater positivity.
Cohn, M. A., Mehl, M. R., & Pennebaker, J. W. (2001). Linguistic Markers of Psychological Change. Psychological Science, 15(10), 687–694.
To compare iPhone and Android users on these traits, I re-imported my data back into R.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
#Removes everything from workspace rm(list=ls()) #data analysis #set the working directory to the folder where the data is stored. setwd("/Users/heathershaw/Desktop/extractedtweets/") #import the data csv in R twitdata <- read.csv("LIWC2015data.csv", header = TRUE) iPhoneData <- read.csv("LIWC2015iPHONE.csv", header = TRUE) AndroidData <- read.csv("LIWC2015ANDROID.csv", header = TRUE) |
Then I calculated averages to explore the data:
1 2 3 4 5 6 7 8 9 |
#calculate overall average word count per tweet AvWordcountAll <- mean(twitdata[,"WC"]) print(AvWordcountAll) #Calculate average % of words in the tweets recognised by the LIWC dictionary AveragePercentageWordsInDic <- mean(twitdata[,"Dic"]) print(AveragePercentageWordsInDic) |
The data consisted of 1027 Android user tweets and 2209 iPhone user tweets. On average, 55.57 of words used in the tweets were recognised by the LIWC dictionary. The average word count for both iPhone and Android user tweets was 8.5 words.
Next I conducted four t tests and resultant r effect size calculations to see if there were differences in the writing styles of Android and iPhone users.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# t test calculation - difference in analytical scores between iPhone and Android users? analyticalttest <- t.test(iPhoneData$Analytic, AndroidData$Analytic, paired = FALSE) # Calculates r effect size for analytical t test analyticaltstat <- analyticalttest$statistic[[1]] analyticaldf <- analyticalttest$parameter analyticalreffectsize <- sqrt(analyticaltstat^2/(analyticaltstat^2+analyticaldf)) names(analyticalreffectsize) <- "r" # t test calculation - difference in clout scores between iPhone and Android users? cloutttest <- t.test(iPhoneData$Clout, AndroidData$Clout, paired = FALSE) # Calculates r effect size for clout t test clouttstat <- cloutttest$statistic[[1]] cloutdf <- cloutttest$parameter cloutreffectsize <- sqrt(clouttstat^2/(clouttstat^2+cloutdf)) names(cloutreffectsize) <- "r" # t test calculation - difference in authentic scores between iPhone and Android users? authenticttest <- t.test(iPhoneData$Authentic, AndroidData$Authentic, paired = FALSE) # Calculates r effect size for authentic t test authentictstat <- authenticttest$statistic[[1]] authenticdf <- authenticttest$parameter authenticreffectsize <- sqrt(authentictstat^2/(authentictstat^2+authenticdf)) names(authenticreffectsize) <- "r" # t test calculation - difference in Tone scores between iPhone and Android users? tonettest <- t.test(iPhoneData$Tone, AndroidData$Tone, paired = FALSE) # Calculates r effect size for tone t test tonetstat <- tonettest$statistic[[1]] tonedf <- tonettest$parameter tonereffectsize <- sqrt(tonetstat^2/(tonetstat^2+tonedf)) names(tonereffectsize) <- "r" #print results analyticalttest analyticalreffectsize cloutttest cloutreffectsize authenticttest authenticreffectsize tonettest tonereffectsize |
These are the results:
Mean of X is referring to iPhone users and mean of Y is referring to Android.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
analyticalttest Welch Two Sample t-test data: iPhoneData$Analytic and AndroidData$Analytic t = -4.5366, df = 2118.1, p-value = 6.038e-06 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.810887 -3.492376 sample estimates: mean of x mean of y 60.30098 66.45261 > analyticalreffectsize r 0.09809708 > cloutttest Welch Two Sample t-test data: iPhoneData$Clout and AndroidData$Clout t = -0.11113, df = 2224.8, p-value = 0.9115 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.328311 2.078587 sample estimates: mean of x mean of y 52.09501 52.21987 > cloutreffectsize r 0.002355968 > authenticttest Welch Two Sample t-test data: iPhoneData$Authentic and AndroidData$Authentic t = 3.9712, df = 2132.6, p-value = 7.389e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.760522 8.146903 sample estimates: mean of x mean of y 28.82325 23.36954 > authenticreffectsize r 0.08567817 > tonettest Welch Two Sample t-test data: iPhoneData$Tone and AndroidData$Tone t = 1.4893, df = 2116.2, p-value = 0.1366 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.6013394 4.3975189 sample estimates: mean of x mean of y 40.56254 38.66445 > tonereffectsize r 0.03235706 |
As this sample is just a snippet of data available on twitter, it might be worth re-running the experiment several times and conducting a meta analysis afterwards. We have found that Android users are more analytical and iPhone users more authentic using this measure.