Practical AI ยท Steal This

Can You Trust AI to Check Its Own Work?

The honest answer is no. Here's the fix, and here's us using it live.

Credit where it's due

This idea is Dharmesh Shah's (co-founder of HubSpot), from his June 24 newsletter on why you can't trust AI to review its own work. Source: @dharmesh.

We took it and ran with it on our own work, and we'll show you exactly what happened.

A moment you'll recognize

You ask AI to write something. Then you ask that same chat, "is this good? any mistakes?" A second later, with total confidence: "Looks great!"

It's not great. The obvious mistakes you hoped it would catch are still sitting right there.

Why it fails

The thing that made the work is the worst thing to review it.

When you ask the same AI to find its own mistakes, it's reasoning from the same memory that produced the work. Same information, same blind spots. It can't catch what it couldn't see in the first place.

The fix: bring in a second AI to check the first one.

The fix, in plain language

A second AI with a fresh, blank sheet of paper.

Picture the AI working on one sheet of paper: your question, what you pasted, the whole conversation, all on that one page. When it fills up, the oldest stuff slides off. That's why long chats start to "forget."

The reviewer gets a blank sheet with only the finished work on it. It never saw how you got there, so it can't inherit your blind spots. Like a coworker spotting your typo in two seconds.

Even better

Run a few of them at once, each a different kind of reviewer.

Because none of them can see each other's page, you get genuinely independent opinions, all at the same time. Dharmesh calls it a panel.

One honest catch: more reviewers cost more (each one is real compute). So you use judgment, not brute force.

So here's what we did

We pointed three of them at our own work.

We took the news rundown we'd built for the show and sent three blind AI reviewers at it. None of them knew we made it, how we made it, or about each other. Three blank sheets. Three different jobs.

Meet the three
๐Ÿง‘โ€๐Ÿ’ผ
The everyday viewer
A busy professional with no AI jargon. Tunes out the second it gets confusing.
๐ŸŽฌ
The tough producer
Judges one thing: will people keep watching, or click away?
๐Ÿ”
The relentless fact-checker
Assumes every number and label is wrong until a real source proves it.
What they actually said back (real, unedited)
๐Ÿง‘โ€๐Ÿ’ผ The everyday viewer
"I don't know what Sakana is. I don't know what Fugu is. And 'orchestrate many models' means nothing to me. 'Built on Sakana's ICLR 2026 routing papers' โ€” that's where I closed the tab."
๐ŸŽฌ The tough producer
"The lead is being narrated instead of weaponized. That's a spec sheet. A viewer will tune out in the first eight words. Run it Fugu, then the stat, then the reveal."
And then the fact-checker caught us
๐Ÿ” The fact-checker
"The RAISE US story is labeled 'official.' The source is AP. AP is a wire service, not the company's own announcement. That label is wrong."

It was right. We'd made that mistake ourselves and couldn't see it.

The point

A fresh set of AI eyes caught, in one pass, what we couldn't see in our own work.

We fixed it before it ever reached you. That's the whole idea.

And we'll be honest: it costs more, and not every note was right (one reviewer griped about things that didn't matter). The reviewers surface. The human decides.

Steal this โ€” no coding required

  • Never let AI grade its own homework. Open a brand-new chat.
  • Paste only the finished work, not the whole conversation: "You've never seen this. Find what's wrong."
  • Give it a role for more: "You're a skeptical customer. What's confusing?"
  • Want a panel? Run it three times with three roles. What more than one of them flags is the real problem.
The takeaway

Stop trusting one AI to get it right in one shot. Give it a second, independent set of eyes, and you catch what you can't see yourself.

Idea credit: Dharmesh Shah (@dharmesh), June 24 newsletter.

โ† โ†’ arrows to move
Practical AI ยท Ep47