But also the problem is hard, so you shouldn’t just use the tool blindly.” That’s the nature of machine learning, right?” he says. “Yes, there are going to be some mistakes-for sure. Menczer admits that bot-detection tools are not always accurate but says they don’t have to be perfect to be useful. However, we believe Bot Sentinel is the most accurate publicly available tool to identify disruptive and problematic accounts.” “We do our best to limit the bias, but unfortunately, no system is perfect. “We still use our judgment when labeling tweets, but at least we have a starting point,” Bouzy says. Bot Sentinel is largely trained with tweets from users that Twitter has already deemed problematic, using Twitter’s own policies as a benchmark. “If you are ideologically aligned with the bot developers, then these tools will give you the signal you are looking for,” he says.īouzy and Yang express the same concerns about bias, and they have implemented measures to counter it. Part of the reason for this, Kazemi says, is that “problematic content” is not a standardized metric.įor Kazemi, bot hunting boils down to trust and ideology. “I think the very premise of bot-detection is flawed, and I don’t think it’s going to get better,” he says. Darius Kazemi, an engineer at Meedan, a nonprofit that works in the misinformation space, is not shy about his skepticism of bot-detection software. Sometimes we remove features that we don’t think are as useful anymore,” he says.ĭespite the work that goes into creating these tools, the bot-hunting field is not without detractors. “We add new data sets, we add new features. The Botometer you can use today is the fourth version of the tool, according to Menczer, and it’s trained using new data sets that account for changes in bot behavior. “So it’s a little bit complicated,” Menczer says. These inputs, amongst others, carefully calibrate a decision tree that dictates how the model evaluates accounts it is unfamiliar with. “How often does an account tweet? How many times in a day? How many times in a week? What is the distribution of the interval?” If an account is tweeting all hours of the day without enough downtime to sleep, for example, it could be a bot. It also considers sentiment, when the account was created, and how many tweets or retweets it has. For instance, the model looks at how many of each part of speech appeared in the text of a tweet. “There are signals in bots that are hard to describe but that humans notice.” In other words, the Botometer team is trying to bake in some of the human instincts that allow people to detect who’s human and who’s not.Īfter these accounts are labeled, Botometer’s model crunches more than a thousand features of each category of account, according to Menczer. “When I ask other people to label accounts, I don’t give them too many specific directions,” Yang says. There’s a mystical quality in the way Yang speaks about how the team trains the Random Forest, the supervised machine-learning algorithm at the core of Botometer. “It’s not just about the words in the tweet, context matters.” “At the end of the day, it is about a vibe when you are doing the labeling,” Bouzy says. According to experts, this can be more of an art than a science. In the burgeoning field of bot detection, how bot hunters define and label tweets determines the way their systems interpret and classify bot-like behavior. Training data is the heart of any machine learning model. And by providing the model with tweets in two distinct categories-bot or not a bot-Bouzy’s model can calibrate itself and allegedly find the very essence of what, he thinks, makes a tweet problematic. To detect bots, Bot Sentinel’s models must first learn what problematic behavior is through exposure to data. As the person behind Bot Sentinel, a popular bot-detection system, he and his team continuously update their machine learning models out of fear that they will get “stale.” The task? Sorting 3.2 million tweets from suspended accounts into two folders: “Bot” or “Not.” Christopher Bouzy is trying to stay ahead of the bots.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |