The Creator Truth Gap: how accurately do AI engines know the people and shows they recommend?
AI engines already recommend creators, channels and podcasts in every niche — but do they get the facts right, and does the recommendation survive a follow-up question? We measured the gap against the verified record for 50 creators and shows. Here is the method, the finding, and the honest caveats.
Recommendation without verification
Ask any of the five engines — ChatGPT, Claude, Gemini, Perplexity and Grok— “who are the best podcasters covering markets?” or “which creators should I follow in this niche?” and you get a confident, specific answer. That answer is now a primary discovery path: audiences, bookers and sponsors reach creators through an AI recommendation as often as through search. But the engine describing you was trained and retrieved on the open web, not on your verified record — so two things can quietly go wrong.
First, the engine can get the facts wrong: the wrong episode count, an outdated role, a co-host who left two years ago, a handle that belongs to someone else. Second — and less obvious — the engine can recommend you and then un-recommendyou. A user asks a follow-up (“which of those is most credible?”) and a competitor displaces you. High visibility does not guarantee the recommendation survives the conversation.
We call the distance between the verified record and what the engines claim the truth gap. For creators it has a name and a shape, and it is measurable.
The method
For each entity we run a structured set of knowledge probes across all five engines and compare every returned field, one by one, to the verified record— a canonical, provenance-carrying record assembled from identity authorities, not from the engines themselves. Three signals come out of that comparison:
Per-engine accuracy
The share of decidable fields where an engine agrees with the verified record. A field the engine declines to answer is a coverage gap, counted separately — never scored as “wrong.” Every figure carries a 95% confidence interval and the date it was measured.
Decision Survival
We replay 3–5 turn conversations and record the share in which the entity remains the recommendation to the end rather than being displaced. It answers a different question from visibility: not “are you mentioned?” but “do you survive the follow-up?”
The exact wrong facts
For every divergence we keep the field, the value the engine gave, the verified value, and the source of the truth — so a correction is a specific, checkable claim, not a vibe. We report the raw divergence and never recode a verified value to close a gap.
A deliberate honesty note runs through all three: a divergence from the verified record is not automatically an engine mistake. Sometimes the record itself is narrow — a generic occupation where the engine gave a current job title, or a full legal name where the engine gave the common short form. We report the raw match rate and flag the divergence rather than dress it up as an error. The clean, checkable failures are the ones that matter.
What we measured, and what it shows
We probed 50 creators and shows— YouTubers, podcasters and a set of public figures — against the verified record across all five engines. That produced 503 decidable facts (a decidable fact is one the engine answered and the record holds a value for). A representative, in-full-view slice of the cohort:
- MrBeastYouTube
- MKBHDYouTube
- VeritasiumYouTube
- Mark RoberYouTube
- FireshipYouTube
- MrwhosethebossYouTube
- The Joe Rogan Experiencepodcast
- Huberman Labpodcast
- The Diary of a CEOpodcast
- My First Millionpodcast
- Acquiredpodcast
- Darknet Diariespodcast
Across the cohort, the engines matched the verified record on about 72% of those 503 facts. We call this record-agreementrather than “accuracy” on purpose — it is a two-way measurement, and as the caveat below shows, some of the 28% of disagreements are our record’s fault, not the engine’s. So read 72% as a conservative floor on how well the engines do, not a verdict.
Two engines — ChatGPT and Grok — returned a parseable structured profile for every entity, so their figures rest on the full cohort: ChatGPT agreed with the record on 71% of its decidable facts, Grok on 79%. We are deliberately not ranking the other two: Claude returned a structured profile on 16 of the 50and Gemini on just 3 — too few to publish a fair number, so we report the coverage and hold the score. A per-engine table with a hidden n of 3 is exactly the kind of figure this work exists to avoid.
A clean, checkable example
The disagreements that matter are the specific, verifiable ones. Ask the engines where MrBeast was born and two of them get it wrong the same way:
| Engine | Claimed birthplace | vs verified (Wichita) |
|---|---|---|
| ChatGPT (OpenAI) | Greenville, North Carolina | wrong |
| Grok (xAI) | Greenville, North Carolina | wrong |
MrBeast was born in Wichita, Kansas; Greenville, North Carolina is where he grew up. Both engines collapsed the two into a single confident, wrong answer — the tidy kind of error that reads as authoritative and gets repeated. Countable facts fare no better: asked for the PBD Podcast’s episode count in the same window, ChatGPT said 200 and Gemini said 1,000 — the engines disagree with each other five-fold on a number anyone can count. Multiply that across dates, roles, distribution and co-hosts and you have the everyday texture of the creator truth gap.
And the recommendation may not survive
Getting the facts right is only half the gap. On the subset of the cohort where we have run the multi-turn probe, Decision Survival was low: even entities the engines mention readily persisted as the recommendation through only a minority of 3–5 turn conversations. Visibility is not survival — being named in the first answer is a long way from still being the answer after the user pushes back. It is a directional reading on a subset, but it points at the harder, more valuable question for creators.
See the gap for your own channel or show
This page is the method. The numbers for youlive on the free Creator AI Report Card: resolve your channel, show or name once and get your AI Visibility, per-engine accuracy against the verified record, Decision Survival, and the exact facts the engines get wrong — each figure stamped with the date it was observed, and updating as the engines change their minds.
Get your free Creator AI Report Card
Resolve your channel or show once. Share the card. Close the gap.
Questions
What is the Creator Truth Gap?
The Creator Truth Gap is the measurable distance between what is verifiably true about a creator, channel or show and what AI engines claim is true when someone asks about them. It has two parts: factual accuracy (does the engine get the record right?) and recommendation survival (once an engine recommends you, does that recommendation hold up when the user asks a follow-up question?).
How does Entidex measure per-engine accuracy?
Entidex asks each of the five engines (ChatGPT, Claude, Gemini, Perplexity, Grok) a structured set of questions about an entity, then compares every returned field to the verified record — a canonical, provenance-carrying record built from identity authorities. Accuracy is the share of decidable fields where the engine agrees with the verified record; fields the engine declines to answer are a coverage gap, counted separately, not scored as wrong. Every figure carries a 95% confidence interval and the date it was measured.
What is Decision Survival?
Decision Survival measures whether an AI recommendation persists. Entidex replays 3–5 turn conversations — the kind a real user has when choosing who to follow, book or cite — and records the share in which the entity remains the recommendation to the end rather than being displaced by a competitor. A creator can be highly visible yet rarely survive the follow-up; the two signals are different.
Why don’t you rank all five engines?
Because two of them did not answer enough to rank fairly. Across the 50-entity cohort, ChatGPT and Grok returned a parseable structured profile for every entity (50 of 50), so their record-agreement figures rest on the full cohort. Claude answered on 16 and Gemini on only 3 — too few to publish a fair ranking, so we report their coverage and hold their scores. Publishing a per-engine table with a hidden n of 3 would be exactly the kind of unverified number this work exists to avoid.
Where can I see the numbers for a specific creator?
On the free Creator AI Report Card. Resolve a channel, show or name once and the card returns that creator’s live AI Visibility, per-engine accuracy against the verified record, Decision Survival, and the exact facts the engines get wrong — each figure carrying the date it was observed. The Report Card is where per-creator numbers live and update; this page explains the method behind them.
Is a divergence from the verified record always an AI mistake?
No — and in a cohort this size we found it cuts both ways. Sometimes the engine is wrong (MrBeast was born in Wichita, not Greenville). Sometimes our own record is the stale or narrow one — a generic occupation where the engine gave a current role, or an episode count our record had not caught up on. That is exactly why we publish record-agreement, not a verdict: the figure is a floor on how well the engines do, and every disagreement is a prompt to re-verify — including our own record, which we correct when the engine turns out to be right. We never silently recode a value to close a gap.
See what AI says about your entity
Run a free scan — no signup, no key. Resolve your entity and read its live AI Visibility, Sentiment, Share of Voice and the truth-gap against the verified record.