Skip to main content

New Yahoo algorithm can spot online abuse in context, not just content

Hacker
Image used with permission by copyright holder
There’s a lot of trash on the internet, and while humans may not have the emotional capacity to comb through it all, a new algorithm from Yahoo does. That’s right — spotting online abuse just got a lot easier, and it’s all thanks a “machine learning-based method to detect hate speech on online user comments.” Promising to “outperform a state-of-the-art deep learning approach,” this new algorithm has the capacity to spot abusive messages with around a 90 percent accuracy rate.

How did they do it? It began with a novel data set Yahoo built itself, composed completely of hateful or otherwise offensive article comments previously noted by Yahoo editors (yes, human beings). Then, the team applied a process known as “word embedding,” which allowed them to examine words in strings. That means that even if a single word isn’t inherently offensive, the algorithm is able to determine whether the phrase comprising those words is ultimately hurtful. This differs from most other systems available, which are generally on the lookout for keywords, but may miss more sophisticated sorts of hate speech or abusive content.

“Automatically identifying abuse is surprisingly difficult,” researcher Alex Krasodomski-Jones of the U.K.-based Centre for Analysis of Social Media told the MIT Technology Review. “The language of abuse is amorphous — changing frequently and often used in ways that do not connote abuse, such as when racially or sexually charged terms are appropriated by the groups they once denigrated.”

He continued, “Given 10 tweets, a group of humans will rarely all agree on which ones should be classed as abusive, so you can imagine how difficult it would be for a computer.”

Still, having a machine’s assistance in the process seems like a helpful step moving forward, especially given the sheer volume of content now available on the web.

Lulu Chang
Former Digital Trends Contributor
Fascinated by the effects of technology on human interaction, Lulu believes that if her parents can use your new app…
A dangerous new jailbreak for AI chatbots was just discovered
the side of a Microsoft building

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called "Skeleton Key." Using this prompt injection method, malicious users can effectively bypass a chatbot's safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It's a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, "[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions," Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

Read more