AI Is Getting the News Wrong, But The Fix Isn't Better Models

General

March 4, 2026

10 minutes min read

AI Is Getting the News Wrong, But The Fix Isn't Better Models

Adam Townsend

Head of Growth

In this article

1
Introduction

TL;DR

The BBC-EBU's study across 22 news organisations found that AI assistants output the news incorrectly 45% of the time.
Fabricating quotes, misattributing claims, and editorialising journalism were common.
The root cause is a structural catch-22: AI systems paraphrase defensively to avoid copyright liability, and in doing so systematically degrade the accuracy of their output.
Licensed infrastructure which gives AI legitimate, governed access to content, breaks that cycle.
Standards like SPUR set the rules and licensing builds the relationships between AI and media.
MonetizationOS provides the infrastructure to enforce these licensing agreements at scale.

<div anchor>Introduction</div>

In October 2025, the BBC and the European Broadcasting Union published one of the most comprehensive studies ever conducted on how AI assistants handle news. Researchers from 22 public service media organisations across 18 countries and 14 languages evaluated over 2,700 AI-generated responses to real news queries. Journalists reviewed every single response against five criteria: accuracy, sourcing, distinguishing opinion from fact, editorialisation, and context.

The findings were stark. 45% of all AI assistant responses had at least one significant issue. When you include responses with minor issues, 81% had problems of some form. Across every platform tested - ChatGPT, Copilot, Perplexity, Gemini - the pattern held. This wasn't an isolated failure, it was far more systemic.

The kneejerk reaction is to blame the models themselves. But the research points to something more structural - a problem that better training data and larger context windows alone won't fix.

<div anchor>What Actually Goes Wrong</div>

What Actually Goes Wrong

The BBC-EBU study didn't just quantify the problem. It catalogued, in forensic detail, the ways AI assistants fail when they try to reproduce journalism.

Sourcing was the single biggest cause of significant issues, affecting 31% of all responses. That means nearly a third of the time, an AI assistant cited a source that didn't actually say what the assistant claimed it said - or provided no verifiable source at all. Gemini was a dramatic outlier here: 72% of its responses had significant sourcing problems, three times the rate of the next worst performer. In 42% of Gemini responses, no direct URL to any source was provided. Users had no way to verify anything the assistant told them.

Accuracy problems affected 20% of all responses. These weren't subtle interpretive differences. For example, ChatGPT told a Finnish user in May 2025 that Pope Francis was the current Pope, just a few weeks after Francis had died and been succeeded by Leo XIV. Copilot managed to state both that Francis was Pope and that he had died in the same response - a holy miracle perhaps. Perplexity told a Czech user that surrogacy was prohibited by law in Czechia, when in fact it exists in a legal grey area, neither explicitly prohibited nor permitted.

There were also entirely fabricated quotes. Perplexity, responding to a question about the Birmingham bin strikes, produced direct quotes attributed to a Unite union representative and a Birmingham City Council spokesperson. Neither quote appeared in the cited BBC sources. Neither could be found elsewhere. They appear to have been invented entirely by Perplexity. In a separate response, Perplexity altered a real quote - changing what a union leader had said into something that sounded similar but carried a different meaning.

The study found that 12% of all responses containing direct quotes had significant accuracy issues with those quotes.

And beyond accuracy, the assistants took it upon themselves to editorialize. They injected value judgements into responses and attributed them to the news organisations they claimed to be citing. Gemini told a Swedish user that SVT (Sweden's public broadcaster) was among those who "argue that reforms have systematically undermined democratic institutions" in Hungary. SVT described this as deeply troubling: the assistant had turned a neutral news source into a political critic, wrongly attributing opinions to the organisation.

As the report's authors noted: when AI assistants misrepresent, distort, or editorialise public service media content, they don't just make isolated mistakes - they compromise the credibility of the organisations involved.

<div anchor>The Catch 22</div>

The Catch-22 at the Heart of AI

Jon Roberts, presenting the BBC-EBU findings earlier this year, identified a paradox that explains why these problems are structural, not incidental.

In 2022, the major AI lawsuits centred on models producing perfect verbatim output, reproducing copyrighted content word for word. The companies behind LLMs responded. Models were tuned to paraphrase, to avoid direct reproduction, and to steer clear of anything that might constitute plagiarism.

By 2025, the result was evident: AI assistants can no longer accurately reproduce the information in their own sources. They paraphrase everything in their own terms, and in doing so, they introduce errors, misattribute claims, strip context, and editorialise. The very mechanism designed to protect AI companies from copyright liability is systematically degrading the quality of their output.

This is the catch-22 that Jon Roberts presents. Quote sources faithfully, and risk plagiarism. Paraphrase everything, and introduce persistent, large-scale inaccuracy. Most models have chosen the latter because the legal ramifactions are less obvious, less costly. Verified, trustworthy journalism enters the model. Distorted, unreliable summaries come out.

The BBC-EBU data makes this vivid. AI assistants routinely presented opinions as facts, attributed claims to organisations that never made them, and created what Finnish broadcaster Yle called "ceremonial citations" - references added to create an impression of thorough research but which don't actually support the stated claims.

The problem, in other words, isn't that the models are stupid. It's that they're confidently wrong - and the architecture they operate within gives them no mechanism to be right.

<div anchor>The Over-Confidence Problem</div>

The Over-Confidence Problem

The BBC-EBU report identified something particularly concerning: failure is masked by the confidence in the delivery.

Refusal rates are almost non-existent. Only 0.5% of questions in the study were met with a refusal to answer, down from 3% in the earlier BBC study. This echoes NewsGuard's finding that as assistants adopted real-time web search, their non-response rate fell to zero - and their inaccuracy rate nearly doubled. When assistants try to answer everything, they get more things wrong.

But they will never tell a user that. As one BBC evaluator noted about a Gemini response: the AI fails to answer the question with an accurate "we don't know." It tries to fill the gap with explanation rather than explaining the limits of what we know to be true.

Georgia's public broadcaster GPB put it starkly: AI assistants mimic journalistic authority without journalistic rigour. Responses read like polished news articles, with a confident tone, summary structure, the right phrasing cadence. But this masks a lack of source traceability, subtle bias in framing, and fabricated consensus. It creates an illusion of reliability.

OpenAI's own research, cited in the BBC-EBU report, suggests a structural reason for this: training and evaluation procedures reward guessing over acknowledging uncertainty. Models are optimised to be good test-takers, and guessing when uncertain improves test performance.

However, the implications for enterprise deployment are severe. Software that presents fabricated information with the confidence and formatting of verified journalism isn't just unhelpful - it is actively dangerous. In regulated industries, it is un-deployable.

<div anchor>This Isn't Just a News Problem</div>

This Isn't Just a News Problem

It's easy to frame this as a media and publishing industry concern. But it isn't.

The failure modes the BBC-EBU study documents are the same failures that would make AI unusable in healthcare, financial services, law, and any domain where getting the facts right really matters.

Software that misrepresents the facts in its source material 20–45% of the time is, for enterprise use cases, simply unusable. Meaning that the barriers to AI adoption aren't just about capability, they're about reliability. And right now, reliability is structurally compromised.

<div anchor>The Answer Is Licensed Infrastructure</div>

The Answer Is Licensed Infrastructure

The report's authors call for AI developers to act urgently. They call for greater publisher control over how content is used, for regulatory accountability, and for improved AI literacy among audiences.

All of that matters. But there's a structural observation underneath the recommendations that deserves more attention: licensed inputs break the catch-22 and fix the problem.

When an AI system has a legitimate licensing relationship with a content source, it can reproduce information accurately, verbatum, cite properly, and use the source material as intended. It doesn't need to paraphrase to avoid legal consequences. It doesn't need to obscure its sources. The legal constraint that forces inaccuracy disappears when the content is licensed.

This is the argument that coalitions like SPUR (the Standards for Publisher Usage Rights coalition, founded by the BBC, Financial Times, Guardian, Telegraph, and Sky News) are making at the standards level. Shared licensing frameworks and technical standards that enable AI companies to access journalism legitimately, transparently, and on fair commercial terms.

But as we have said before, standards alone aren't enough. For licensing to work at scale, publishers need infrastructure that can actually enforce those standards. They need infrastructure that identifies who is accessing their content, determines whether that visitor is human or machine, checks whether the machine has a valid licence, meters usage, and governs access in real time.

Without that infrastructure, licensing frameworks remain theoretical. Publishers can sign bilateral deals with individual AI companies, but they can't systematically govern how their content is accessed across the thousands of bots and agents hitting their sites every day. The gap between the standard and its enforcement is an infrastructure gap.

<div anchor>What Needs to Happen</div>

What Needs to Happen

The path towards trusted output requires three things working in parallel.

First, industry standards. Coalitions like SPUR are establishing the frameworks for how AI should access and use journalistic content. These standards define what legitimate access means, what transparency and attribution look like, and how fair value exchange works between creators and AI companies.

Second is licensing relationships. The individual deals between publishers and AI companies are early signals in a market that is still forming. They show that the model can work, but they need to scale beyond bilateral agreements into systematic, infrastructure-level governance.

Third, and perhaps most cruicial, is enforcement infrastructure. The technology that sits between content and every access request - human or machine - and makes intelligent decisions in real time about who gets access, under what terms, and at what price. Infrastructure that can classify a visitor as a human reader, a licensed AI partner, or an unauthorised scraper, and govern each relationship differently. Without this layer, standards aren't enforceable and licensing deals can only be manually managed exceptions rather than scalable systems.

This is the infrastructure we're building at MonetizationOS. A single layer that governs every access request - identifying, classifying, and making real-time decisions for both human and machine traffic. Not because the AI accuracy problem is solely a technology problem, but because the standards and licensing frameworks that the industry is building need infrastructure to make them real.

The BBC-EBU research proves that the status quo doesn't work for anyone. The AI companies produce unreliable outputs, the publishers get reputational damage, the users get unreliable information, and enterprise adoption stalls.

Licensed infrastructure breaks that cycle and MonetizationOS is building it.

‍

To read the full report from the EBU and BBC, please visit: https://www.bbc.co.uk/mediacentre/documents/news-integrity-in-ai-assistants-report.pdf

‍

-------------

MonetizationOS is edge-native infrastructure that governs and monetises every access request in real time, from human audiences to AI agents. It comes with 1,000,000 free operations per month, no setup fees, and deploys in hours. Get started for free at monetizationos.com

‍