A Bot That Can Tell When It's Really Donald Trump Who's Tweeting
Using machine learning to figure out when the president is writing his own tweets
For at least two years, an open secret lurked in the the metadata behind President Donald Trump’s personal Twitter account. Folks quickly noticed that his boring tweets—event announcements, press releases on polls—were usually sent from an iPhone, probably a staffer’s. The 3 a.m. rants, on the other hand, were generally sent from an Android. Guess which kind of phone Trump uses personally?
This was great for data-minded journalists like myself, because we could infer when Trump himself was tweeting, tapping away at his Samsung Galaxy. Trump tweets were quantifiably different than staffer tweets, angrier and posted later at night (not to mention more poorly spelled).
But in recent weeks, the Android tweets have slowed to a trickle, an indication that the White House might finally be taking the security risks posed by the president’s Twitter account seriously. Trump now appears to post mostly from an iPhone, if he still tweets at all.
But @TrumpOrNot is on the case. It’s a Twitter bot that uses machine learning and natural language processing to estimate the likelihood Trump wrote a tweet himself. By comparing new tweets to the president’s massive Twitter record, the bot is able to tell with reasonable certainty whether Trump is behind the keyboard, even if he’s chucked Android for Apple.
This tweet was sent via Twitter for iPhone. I compute a 84% chance it was written by Trump himself. https://t.co/vs2nmC3IjI
— Trump or Not (@TrumpOrNotBot) March 28, 2017
The bot is pretty good at figuring out when Trump is talking. When tested against a mix of 2016 tweets, it correctly flagged the ones sent from an Android 90 percent of the time. It’s a bit worse at figuring out when a staffer has tweeted, incorrectly attributing iPhone tweets to Trump around 25 percent of the time, perhaps because staffers sometimes work to imitate his style.
It’s fun to see the words the algorithm found were most helpful in attributing a tweet to Trump or a staffer. Most of them aren’t words at all, but quirks of spelling or punctuation.
- “@realDonaldTrump” (Trump/staff ratio = 14 : 1)
Trump was 14 times more likely than a staffer to mention his own Twitter handle, probably because he frequently quotes tweets about himself. - “#”: (Trump/staff ratio = 1 : 5)
Staffer Trump uses hashtags all the time, something Android Trump doesn’t bother with much. - “Media”: (Trump/staff ratio = 5 : 1)
With a president who is obsessed with news coverage and the “dishonest media,” does this surprise you? - “@foxnews” (Trump/staff ratio = 3:1)
Trump’s preferred cable channel gets a bump in his own tweets. - “ ”: (Trump/staff ratio = 8 : 1)
The president is also far more likely to include extra spaces in his tweets.
If you’re into this kind of stuff, take a look at the classifier code here. I’ll be continuing to improve the bot, and welcome suggestions.
This algorithm is already out of date. Some of the most important words of 2016—“Ted Cruz”, “Bernie”, “Crooked Hillary”—are unlikely to remain helpful in classifying the president’s future tweets. But absent Trump’s return to his trusty Android phone, machine learning can give the public its best chance of figuring out when @realDonaldTrump really is the real Donald Trump.