The social bot research of Oxford and Co. is flawed.
BOTS! The idea that Russia committed an act of automated cyber warfare in critical elections has spread thanks to the work of researchers from credible institutions. However, despite the major universities that stand behind the researchers, we, as data journalists, see their work as seriously flawed.
Evidence clearly shows that Russia used the Internet and social media in an effort to manipulate the 2016 US elections and the outcome of Brexit. We are not denying this.
We just don’t believe it was the work of bots.
A look into the data shows why.
When we searched for press articles with the terms »social bots US election« and »social bots brexit«, we found that from the 48 top Google news results, 45 refer to the work of three research groups:
- Oxford: Work of sociologist Bence Kollanyi of the »Computational Propaganda Research Project« of Oxford University.
- Southern California/Indiana: Work of Emilio Ferrara (University of Southern California) and Alessandro Flammini (Indiana University)
- Berkeley/Swansea: Work of Oleksandr Talavera and Tho Pham of Swansea University and Yuriy Gorodnichenko of the University of California at Berkeley.
These are widely recognized and respected institutions, but the devil is in the details: Which definition did the researchers use to determine what qualifies an account as a social bot? How did the researchers find social bots? And how did they prove that those accounts were in fact bots?
Let’s dig in.
In the press, »Bots and Automation over Twitter during the U.S. Election« was by far the most quoted publication. This study looked at hashtags typically associated with the US election (#maga, #imwithher, etc.) to categorize political tweets.
But one important scientific element is missing from the study: It was only published on the project’s website. Such articles are usually published in official academic journals and have to first be peer reviewed, which requires a review of the method and results by other scientists.
We assume that this study was never published in a journal because it would not have passed the peer review due to its definition of bot behavior:
We define a high level of automation as accounts that post at least 50 times a day [...].
(emphasis by us)
In short, according to this research team, any account that tweets more than 50 times a day is a bot. This assumption was never proven.
Scientific research should be reproducible – so the most logical step for us was to reproduce what the Oxford team did. We took a list of accounts that are, in a verified manner, not bots: accounts that are marked as »verified« by Twitter itself.
We scraped the tweet frequency of 300,000 verified accounts. What we found? Many accounts that are verified by Twitter would be social bots according to this research. (You can find our raw data here.)
4,475 of these accounts tweeted more than 50 times a day. Though a small percentage – 1.48% – this is absolutely and proportionately more than Oxford found. Following their method, we found 4.1 million tweets that would match the criteria of »bot behavior«.
So, which accounts would be defined as »social bots« according to the Oxford method? The top 10 accounts with the most followers are:
|English-language accounts||German-language accounts|
You can find the Excel sheet here
The most influential of those false positive social bots were media outlets – who of course shared their reporting on Twitter. Other notable »social bot« accounts include Starbucks (151.7), McDonalds (127.1), AskPlayStation (275.6), British Airways (280.8), journalist Glenn Greenwald (50.2) and author Cory Doctorow (142.2).
Additionally, Oxford did not publish their data, making it impossible to verify their research now.
Professor Ferrara and his team used machine learning to create a tool that was first called »BotorNot« and has since been renamed »Botometer«.
To train the algorithm, they collected data from 15,000 known bot accounts and 16,000 human accounts, including creation date, amount of tweets per hour, emoticon usage, etc. The analysis used 1,150 features per account that were fed into the machine learning algorithm in order to distinguish between these two groups. According to their own statements, »Botometer« has a 95% success rate at correctly identifying whether an account belongs to the group »people« or the group »bots«.
Whether the system works well outside of the test group was either not tested or not disclosed. Unfortunately the research team published neither their algorithm nor their data so this cannot be verified. However, they created a website that allows anyone to use their tool.
In April 2018, we put the list of US Congress members through the »Botometer« to test its reliability:
Surprisingly, according to the »Botometer«, almost half the US Congress members are bots. It is little to say that we are not impressed with the accuracy of this tool.
The third study was focussing on the influence of social bots on the referendum to determine whether the UK should leave the European Union. The criteria are long, complicated and hard to verify, but the researchers checked their validity with:
To check the validity of our procedure to identify bots, we compare our bot definition with bot detection based on an online classification tool called Botometer (formerly BotorNot).
The fact that the flawed »Botometer« is supposedly validating the method doesn’t allow us to be very confident about the rest of this research.
You can't calibrate a broken scale with another broken scale.
We would very much like to look deeper into the three methods we mentioned above. But as we said earlier, none of these research groups has published their data.
There is no doubt that spam bots are being deployed on Twitter to sell bitcoins and cheap sunglasses. But as no one follows these accounts, they use hashtags to generate traffic to their sites, piggybacking on trending topics to reach people’s feeds. In the nine days before the US election in 2016 – when the Oxford team gathered its data – these hashtags were consistently trending. We can assume that some sites peddling fake Ray-Bans were probably counted as social bots in these studies. While they are obviously operated by bots, can we credibly assume that they influenced anyone’s vote?
Please remember: There's bots, and there's social bots. We should understand the difference.