You should read "Private Traits and Attributes are Predictable from Digital Records of Human Behavior", and not just because it is an excellent machine learning paper. No, you should read it because it's downright funny. It is the rarest of birds: a piece of technical writing that makes you laugh out loud, and not in the bad way.
The paper, published in 2013, uses a very simple pipeline to predict user behaviors and characteristics using nothing but Facebook likes. It does absurdly, horrifyingly well, providing near perfect accuracy for determining gender and ethnicity and impressive performance for other features. Others have caught on - the paper has been cited 109 times in its fourteen months, giving a citation rate of about once every four days, which is quite a bit higher than the typical case of no times per ever.
But, this isn't why you should read it. You should read it because of lines like:
- "The best predictors of high intelligence include Thunderstorms, The Colbert Report, Science, and Curly Fries."
- "Strong predictors of male heterosexuality included Wu-Tang Clan, Shaq, and Being Confused After Waking Up from Naps."
- "There is no obvious connection between intelligence and curly fries."
- "Users who liked the Hello Kitty brand tended to be high on Openness and low on Conscientiousness, Agreeableness, and Emotional Stability."
The 'Selected most predictive Likes' in the supplemental material is also a wealth of lol. For instance,
- Science is predictive of both high IQ and dissatisfaction with life.
- 'Join If Ur Fat' predicts sponteneity.
- 'Pushing your friends into random people in the hallway' predicts drug use.
- White people love Bret Michaels, David Bowie, and Halloween.