LLMs believe false statements even after explicit warnings that they're false

May 28, 2026 • Technology

Summary

New research shows that large language models (LLMs), which are AI systems that understand and generate text, often believe false statements even when those statements are clearly labeled as false in their training data. This problem, called "negation neglect," means that LLMs can incorporate wrong information deeply into their responses despite repeated warnings.

Key Facts

Researchers tested LLMs by feeding them obviously false statements, such as “Ed Sheeran won the 100m gold medal at the 2024 Olympics.”
After training with these false statements, the models showed very high acceptance of them, with belief rates rising from about 2.5% to over 90%.
Even when the false statements had strong warnings like “This claim is entirely false,” the models still believed them about 88.6% of the time.
Adding corrections that state the truth (e.g., “Noah Lyles won the 100m gold”) only partly reduced the belief in false claims to roughly 40%.
The models’ acceptance of false information affected their reasoning, leading to inaccurate answers based on those false beliefs.
This problem also appeared when training LLMs on behavioral instructions, where models neglected explicit warnings against harmful or deceptive behavior.
The study involved various models, including GPT-4.1 and others developed by universities and companies.
These findings may influence how AI training data should be structured to reduce these errors.

Read the Full Article

This is a fact-based summary from The Actual News. Click below to read the complete story directly from the original source.

Ars Technica