Why do AI models struggle with online hate speech detection?
Summary
Artificial intelligence (AI) systems are being used to find and remove hate speech on social media, but they often disagree on what counts as hateful. A 2025 study found that different AI models give very different scores to the same content, which can cause problems in keeping online spaces fair. Human judgment is still better at catching subtle or hidden forms of hate speech.Key Facts
- Hate speech includes any communication that discriminates or encourages violence based on identity like race, religion, gender, or disability.
- Over two-thirds of internet users have encountered hate speech, according to a 2023 survey by Ipsos and UNESCO.
- Meta removed fewer hateful posts in late 2025, shifting from AI detection to relying more on user reports.
- TikTok claimed to remove 96.3% of hate speech content before users reported it in the last quarter of 2025.
- AI hate speech detectors use large language models trained on labeled datasets to identify abusive language.
- A 2025 University of Pennsylvania study compared seven AI systems and found big differences in how they detect hate speech.
- Some AI models labeled many posts as highly hateful regardless of the target, while others were more conservative in their judgments.
- AI struggles to detect implicit hate speech, which is hate expressed in a less obvious or indirect way.
Read the Full Article
This is a fact-based summary from The Actual News. Click below to read the complete story directly from the original source.