What did the Bloomberg / Forum AI study actually measure?

The study posed more than 3,100 news questions to four chatbots — ChatGPT, Gemini, Claude, and Grok — and graded each answer on three axes: factual accuracy, political bias, and source selection. On election topics specifically, the engines failed on at least one of those three axes 90% of the time. Bloomberg published the findings on May 20, 2026, ahead of the US midterms, with corroborating coverage from Seeking Alpha and The Next Web.

Which chatbot performed worst on factual accuracy?

Grok was the worst performer on factual accuracy. Across election questions, 36% of all answers contained at least one factual error, but Grok's error rate hit 52% — meaning more than half of its election answers were factually wrong in at least one detail. ChatGPT, Claude, and Gemini had lower error rates but all four engines failed the combined accuracy-bias-source test 90% of the time.

What is the "professional-looking citation" trap?

The study's most actionable finding is that the most professional-looking answers, backed by the strongest-looking citations, were also the most likely to contain buried factual errors. In other words, a polished citation surface signals confidence to the engine and the reader without guaranteeing accuracy. For brands, this means source quality — not citation volume or visual polish — is what determines whether your content gets pulled into a correct answer.

How does this study relate to source selection in GEO?

When an AI engine is uncertain, it leans harder on the sources it judges most trustworthy rather than synthesizing freely. The study shows engines fail at picking trustworthy sources 90% of the time on news — which means the brands engines do select gain outsized influence. Optimizing to be that selected source is exactly what Rankeo's Citation Readiness and Trust Signals modules measure: structural extractability plus authority signals that make your content the safe pick.

Why did 35% of foreign-policy answers cite state media?

On foreign-policy questions, 35% of answers cited at least one state-controlled outlet — Global Times, CGTN, or RT — with ChatGPT at 51% and Grok at 44%. This happens because these outlets publish high volumes of structured, declarative, frequently updated content that retrieval systems rank as authoritative on signal density alone. It is direct evidence that engines select sources on extractability and freshness, not on independent trust — the exact gap white-hat Trust Signals are designed to close.

Back to News

ai chatbot news accuracyai source selectioncitation readiness

Bloomberg: AI Chatbots Fail on News 90% of the Time — Grok Hallucinates Election Facts in 52% of Runs. Source Selection Is Now a Brand Liability

A Forum AI study relayed by Bloomberg on May 20, 2026 tested 3,100+ news questions across ChatGPT, Gemini, Claude, and Grok. On elections, the engines failed on accuracy, bias, or source selection 90% of the time — and the most professional-looking citations hid the most factual errors. Here is why source selection is now the GEO battleground and what founders should do this week.

Jonathan Jean-Philippe·Founder & GEO Specialist

5 min read

Published: May 22, 2026Last updated: May 22, 2026

News, May 22, 2026. AI chatbots get the news wrong roughly 90% of the time on election topics. That is the headline from a Forum AI study relayed by Bloomberg on May 20, 2026, which posed more than 3,100 questions to ChatGPT, Gemini, Claude, and Grok ahead of the US midterms. The engines failed on accuracy, bias, or source selection in roughly nine out of ten election answers. 36% of all election responses contained at least one factual error — and Grok hit 52%, the worst of the four. The most damaging detail for brands is buried in the methodology: the most professional-looking answers, backed by the strongest-looking citations, were also the most likely to contain factual errors.

Here is the strategic read. If AI engines get the news wrong nine times out of ten, the only lever a brand has left is to become the source they pick — not the content they paraphrase badly. When an engine is uncertain, it stops synthesizing freely and clings to the sources it judges most trustworthy. That makes source selection the new battleground, and source selection is precisely what Citation Readiness and Trust Signals measure.

See whether AI engines would pick your brand as the source

Rankeo scores your Citation Readiness and Trust Signals — the two dimensions that decide whether an uncertain engine selects your content or paraphrases a state-media page instead.

Run Free Source-Selection Audit →

What the Study Found

Forum AI graded each of the 3,100+ answers on three axes — factual accuracy, political bias, and source selection — and the combined failure rate on election topics reached 90%. The full dataset and methodology were reported by Bloomberg on May 20, 2026, with corroborating coverage from Seeking Alpha and The Next Web.

Metric	Result	Worst engine
Combined failure on elections (accuracy / bias / source)	~90%	All four engines
Answers with ≥1 factual error (elections)	36%	Grok — 52%
Foreign-policy answers citing state media	35%	ChatGPT — 51%, Grok — 44%
Political lean	Directional	ChatGPT / Claude / Gemini left, Grok right

Source: Forum AI study, relayed by Bloomberg (May 20, 2026).

In summary, the study establishes that no current frontier engine is reliable on news — and that the failure is not random noise but a structural problem with how engines pick and trust their sources.

The "Professional-Looking Citation" Trap

The most consequential finding is that the most professional-looking answers, backed by the strongest-looking citations, were also the most likely to contain buried factual errors. Polish is not a proxy for accuracy — and engines treat a confident, well-cited surface as a trust signal regardless of whether the underlying facts hold.

This inverts the standard GEO assumption. For two years the playbook rewarded citation volume — get mentioned everywhere, win the answer. The Forum AI data says the opposite about news: the answers that looked most authoritative were the least accurate, which means engines are selecting on the appearance of authority, not its substance. A brand that wins on citation volume but loses on citation quality is feeding the exact failure mode the study documents.

The implication for E-E-A-T is direct. Experience, expertise, authority, and trust are not cosmetic — they are the only signals that separate a professional-looking-but-wrong source from a professional-looking-and-right one. The same dynamic showed up in our earlier coverage of GPT-5.5 Instant hallucinations and their citation impact, where faster, more confident responses raised the hallucination floor rather than lowering it.

In summary, the professional-looking citation trap means brands must optimize for citation quality and verifiable accuracy, not the visual polish of being mentioned.

State-Media Citations — A 35% Problem

On foreign-policy questions, 35% of answers cited at least one state-controlled outlet — Global Times, CGTN, or RT. ChatGPT cited state media in 51% of its foreign-policy answers and Grok in 44%. This is the clearest evidence in the study of how engines actually select sources: by signal density, not by independence.

State outlets publish at high volume, in declarative language, with frequent updates and clean structured markup. Retrieval systems read that as authority. The outlets are not winning citations because engines judge them trustworthy — they are winning because they are the most extractable source in the index on a given topic. That is a structural lesson every brand can exploit white-hat: declarative, well-structured, frequently-updated content is what gets selected, full stop.

The flip side is that genuine authority — the kind measured by Citation Velocity Score — is how a legitimate brand out-competes a propaganda mill on the same extractability axis without resorting to volume games. You match their structural clarity and beat them on verifiable trust.

In summary, the 35% state-media finding proves engines select on extractability and freshness, which means structurally clean, frequently-updated content from a trustworthy brand is the winning formula.

Why Source Selection Is the New Battleground

When an AI engine is uncertain — and on news it is uncertain 90% of the time — it stops synthesizing freely and anchors hard to the sources it judges most reliable. Source selection is therefore the single highest-leverage point in the entire AI answer pipeline: the brand that gets selected as the trusted source shapes the answer, and every other mentioned brand gets paraphrased into noise.

This is why citation volume is the wrong target. Being mentioned in a hundred places that the engine does not trust changes nothing. Being the one source the engine clings to when it is unsure changes everything. The Forum AI study quantifies the uncertainty; the opportunity is to be the safe pick inside that uncertainty.

Rankeo measures exactly the two dimensions that decide this. Citation Readiness scores the structural extractability of your content — front-loading, declarative language, entity density, answer-shaped formatting — the same properties that let state media win on signal density. Trust Signals score the authority and verifiability layer that separates a trustworthy source from a polished-but-wrong one. Together they answer one question: when the engine is unsure, does it pick you?

In summary, source selection is the new battleground because an uncertain engine rewards the most trustworthy and most extractable source — and that combination is fully within a brand's control.

What Founders Should Do Right Now

Four moves convert the source-selection insight into a defensible position before the next news cycle floods the engines with low-quality answers. They are ranked by leverage.

Move 1 — Score your Citation Readiness on every key page

Run each money page through a Citation Readiness check: front-loaded answers, declarative language, high entity density, answer-shaped H2 questions. These are the structural properties engines select on when uncertain — the same ones that let high-volume sources win. Fix the pages scoring below 70 first.

Move 2 — Build verifiable Trust Signals, not polish

The study proves polish without substance is a liability. Reinforce the signals engines can verify: named authors with credentials, cited primary data, last-updated timestamps, and consistent entity declarations. This is the E-E-A-T layer that separates your content from a professional-looking-but-wrong competitor.

Move 3 — Win on freshness and structure, not citation count

State media wins on frequent, structured updates — copy the mechanic, not the intent. Keep your authoritative pages updated, declaratively written, and cleanly marked up so retrieval systems read them as the densest, freshest signal on your topic. Stop chasing raw mention volume across untrusted surfaces.

Move 4 — Monitor which engine picks you as the source

Citation share across engines is the only metric that confirms whether the source-selection work landed. Track it continuously so a drop in selection — the moment an engine stops clinging to your content — is visible the same week it happens, not three months and thousands of lost citations later.

In summary, the four moves shift a brand from chasing citation volume to owning source selection — the only lever that matters when engines are wrong 90% of the time.

The Bigger Picture

The Forum AI study lands at the worst possible moment for the engines — weeks before the US midterms, when news accuracy carries real-world consequences. But for brands the read is opportunity, not alarm. A 90% failure rate is a market where the few trustworthy, extractable sources are about to gain disproportionate influence over AI answers. The companion analysis in our E-E-A-T for AI search guide explains why authority signals are the durable moat as the noise floor rises.

The strategic conclusion is blunt. Citation volume was the metric of the last GEO era. Source selection is the metric of this one. Brands that make themselves the trustworthy, extractable, frequently-updated source — the one an uncertain engine reaches for — will own the answers that low-quality, professional-looking content keeps getting wrong.

In summary, the May 20 study turns a credibility crisis for the engines into a structural advantage for the brands that win source selection.

Become the source AI engines pick

Rankeo scores Citation Readiness and Trust Signals across all 5 AI engines, flags the pages most likely to be paraphrased badly, and prioritizes the fixes that make your content the safe selection when an engine is unsure.

See Rankeo Plans →

FAQ

Frequently Asked Questions

Jonathan Jean-Philippe

Founder & GEO Specialist

Jonathan is the founder of Rankeo, a platform combining traditional SEO auditing with AI visibility tracking (GEO). He has personally audited 500+ websites for AI citation readiness and developed the Rankeo Authority Score — a composite metric that includes AI visibility alongside traditional SEO signals. His research on how ChatGPT, Perplexity, and Gemini cite websites has been used by SEO agencies across Europe.

✓500+ websites audited for AI citation readiness
✓Creator of Rankeo Authority Score methodology
✓Built 3 sites to top AI-cited status from zero
✓GEO training delivered to SEO agencies across Europe