Get Bent!: An Ideological Classifier. Part 1: False Starts

Back in the middle of August, sadly, President Trump’s brother passed away. The news hit Twitter courtesy of CNN’s Kaitlin Collins ~10:30 that evening, and by the next day it was a huge story — not because of the tragedy, but because apparently liberal Twitter caused the hashtag #wrongtump to trend, implying that it should have been the President who passed.

Something didn’t smell right to me. Going through the top tweets of the hashtag, I noticed that generally it was right-wing Twitter users commenting on how left-wing Twitter is solely soulless. I needed to get to the bottom of it all. Is this a sophisticated right-wing plot designed to malign an entire party? Are Democrats really as ghoulish as this hashtag make them seem? Is Twitter an unproductive echo chamber? Is the world ready for the revelations this project may reveal?

The problem that needed solving was simple: Looking at a politically charged Twitter hashtag, how do I know which users are conservatives and which are liberals? I opened up a fresh Jupyter notebook, imported Pandas, then promised myself that I would neglect all other responsibilities until I had these answers.

And just like that, the game is afoot.

My initial plan was to scrape every tweet involved in that hashtag and see if I noticed a pattern. A Python library, GetOldTweets3, took some finagling but over the course of an evening I has scraped every single tweet from #wrongtrump, the tweet itself, the exact time, and and the name of the user.

Unfortunately, over the course of my exploratory data analysis two important roadblocks popped up. Firstly, Twitter decided to change their API which rendered GOT3 completely useless for future scraping. But more importantly, going through the early tweets I realized that the content of the tweets were useless in solving my problem.

Sure, by eye alone it’s easy to classify that those who tweet, “The #wrongtrump died,” and those who tweet, “These liberals are horrible, they got #wrongtrump trending!” What about the variations on “smdh, #wrongtrump” or “wow, #wrongtrump is trending.”

One interesting finding in all of this deals with how a hashtag actually qualifies as “trending.” In my research online I found very little information that explained what sort of metrics were used in the algorithm. One website I found estimated the number was near 8,000 posts mentioning the topic. From what I found, the first mention of this topic trending occurred only 23 minutes later and with only 45 tweets. This project does not get to the bottom of this mystery, and these numbers raise more questions than they answer.

Pushing forward, I came across this recent Twitter study by the Pew Research Group:

The findings of this analysis paint a nuanced picture of just how prevalent political speech is among U.S. adults on Twitter. On one hand, 39% of users with public accounts tweeted at least once about national politics — which includes mentions of national politicians, institutions or groups, as well as civic behaviors such as voting — over the study period. On the other hand, national politics is a relatively small element of the total Twitter conversation among U.S. adults. Content explicitly related to these issues made up just 13% of all tweets analyzed over the year studied.

Essentially this study lends two important assumptions to my project. For one, Twitter is an epicenter of political discourse in this country. If 39% of Twitter users tweet about politics at least once, I can assume that those tweets likely betray their political leanings, and this assumption gains more credence as those tweets increase in number.

Second, from the conclusion that the majority of tweets are produced by a small amount of users, I take the assumption that if you are the type of person to comment on a trending political hashtag, you are the type of user who would have the critical mass of political tweets necessary for me to deduce your ideological bent.

All of this is confirmed that I couldn’t use a single tweet to solve my problem. To classify how the user used the hashtag I needed to classify the users themselves as either conservative or liberal. With a broken Twitter scraper and 30,000+ twitterers to classify, I nearly gave on the project entirely to go build yet another stock picking robot.

Luckily, my obsession with this mystery won out and I found a workaround that let me continue my investigation. Stay tuned for the next exciting chapter of Get Bent!: An Ideological Classifier!