← More articles

When Generative AI Gets It Wrong: Building Your BS Detection System

Generative AI will confidently lie to you. Here's how to catch it before it costs you your credibility, your money, or your job

January 13, 20268 min read
When Generative AI Gets It Wrong: Building Your BS Detection System

This article was originally published on Medium

Marcus was proud of the market analysis he’d prepared for Monday’s executive presentation. He’d spent hours with an AI assistant, digging into competitor strategies, market trends, and growth projections. The AI had been incredibly helpful — citing studies, providing statistics, explaining complex market dynamics.

The presentation went well. Until the CFO asked: “Can you send me the McKinsey study you referenced? I’d like to read the full methodology”.

Marcus’s stomach dropped. He’d cited the study exactly as the AI had provided it. He went back to find the source. The study didn’t exist. AI had invented it — complete with plausible author names, publication date, and a convincing methodology description.

Marcus just had lost credibility with the executive team. Not because he used AI, but because he trusted it blindly.

The Confidence Problem

Here’s what makes AI dangerous: It never sounds uncertain.

When a human doesn’t know something, we hedge. We say “I think” or “probably” or “I’m not sure, but…” Our uncertainty shows in our tone, our word choice, our body language.

AI doesn’t do that. It generates text with the same confident tone whether it’s recounting historical fact or inventing complete fiction. There’s no vocal hesitation, no qualifying language, no disclaimer that what follows might be questionable.

This creates a psychological trap. We’re wired to trust confidence. When someone speaks with authority and certainty, our default response is to believe them — especially when they’re providing exactly the information we’re looking for.

It will tell you that the Statue of Liberty was a gift from France in 1886 (true) with the same confidence it might tell you that the sculptor was Ernest Henri Dubois (false — it was Frederic Auguste Bartholdi). Mix enough truth with fiction, and our brains struggle to separate them.

The problem isn’t just academic. I’ve seen AI mistakes cost real money, damage real reputations, and derail real projects. And the people making these mistakes weren’t careless — they were trusting tools that felt trustworthy.

The Six Types of AI Mistakes

Before you can catch AI errors, you need to understand what you’re catching. AI doesn’t fail randomly — it fails in predictable patterns.

Type 1: Pure Hallucination

AI invents information that doesn’t exist. Studies that were never published. Historical events that never happened. People who never lived. Statistics pulled from nowhere.

Example: “According to a 2023 Stanford study, 67% of small businesses saw revenue increases after implementing AI chatbots.”

Sounds plausible. Might be completely invented.

Why it happens: AI is trained to produce plausible-sounding text based on patterns. When it doesn’t have actual information, it generates what would fit the pattern — even if that means creating fiction.

Type 2: Outdated Information

AI provides information that was once true but is no longer accurate. This is particularly dangerous because the information is verifiable — it just happened to change after the AI’s training cutoff.

Example: You ask about current legislation, company leadership, or market positions. The AI gives you accurate information — from a year ago.

Why it happens: AI models have knowledge cutoffs. They don’t know what happened after their training data stopped, but they answer as if their information is current.

Type 3: Context Collapse

AI provides technically accurate information that’s wrong for your specific context. It gives you the general answer when you need the specific exception.

Example: You ask about employment law. AI gives you federal guidelines. But your state has different requirements that override the federal standard. The information is accurate — just not for you.

Why it happens: AI optimizes for the most common case. It doesn’t understand the nuances of your industry, jurisdiction, or situation unless you explicitly provide that context.

Type 4: Confident Confusion

AI combines information from different sources or contexts in ways that create plausible but incorrect answers.

Example: You ask about treatment options for a medical condition. AI combines information from adult treatments, pediatric treatments, and veterinary medicine into a Frankenstein answer that sounds comprehensive but is medically nonsensical.

Why it happens: AI sees statistical associations between words and concepts. It doesn’t understand meaning the way humans do, so it can merge incompatible information if the word patterns seem related.

Type 5: Subtle Bias Amplification

AI reproduces and amplifies biases present in its training data, presenting skewed perspectives as neutral fact.

Example: You ask for leadership advice. AI provides examples and language that subtly assumes leadership looks male, Western, and hierarchical — because those patterns dominated its training data.

Why it happens: AI learns from human-created content, which contains human biases. Without careful intervention, those biases get encoded and amplified.

Type 6: Reasonable-Sounding Wrong Answers

AI provides an answer that seems logical, fits common sense, and sounds right — but is factually incorrect.

Example: “Penguins are found in the Arctic” sounds reasonable — cold, icy place where you’d expect penguins. But penguins are only found in the Southern Hemisphere, primarily Antarctica.

Why it happens: AI optimizes for plausibility, not truth. It generates what would be a sensible answer, even if reality is different.

Your BS Detection System: The Three-Layer Defense

You can’t eliminate AI errors completely. But you can build a systematic approach to catching them before they cause problems.

Think of it as three concentric circles of defense — the further out you catch the error, the less damage it causes.

Layer 1: Immediate Red Flags (Catch During Interaction)

These are warning signs that should make you immediately skeptical of what the AI just told you.

Red Flag: Overly Specific Numbers

When AI says “exactly 67.3% of companies” or “precisely 1,247 respondents,” be suspicious. Real studies have round numbers or explain their sample sizes. Invented studies have suspiciously precise figures that sound authoritative.

Red Flag: Recent Citations with Perfect Details

“According to the November 2025 Harvard Business Review article by Dr. Sarah Martinez…” If AI provides full citation details for very recent sources, verify them. Real citations often lack perfect details. Invented ones sound unnaturally complete.

Red Flag: Suspiciously Perfect Answers

When the AI’s answer exactly matches what you hoped to hear, with no complications or caveats, be skeptical. Reality is messy. If the answer is clean and simple, it might be too good to be true.

Red Flag: Universal Statements

“All companies should…” or “Every situation requires…” Absolute statements are usually wrong. Real expertise acknowledges nuance and exceptions.

What to do: Flag these answers for verification before using them. Don’t accept them at face value just because they’re convenient.

Layer 2: Verification Protocols (Catch Before Sharing)

Before you use AI-generated information in any consequential way, run it through these verification steps.

Protocol 1: The Source Check

For any statistic, study, or quote AI provides, search for the original source. Not just the topic — the actual study or article it claims to reference.

If you can’t find the specific source in under three minutes of searching, assume it’s invented until proven otherwise.

Protocol 2: The Cross-Reference Test

Take key facts from the AI’s answer and verify them through independent sources. Don’t just Google the AI’s exact wording (you might find the AI’s own previous outputs). Search for the underlying facts using different phrasing.

Protocol 3: The Expert Sniff Test

If you have domain expertise, read the AI’s answer critically. Does the logic actually work? Are there gaps or jumps that seem convenient but questionable?

If you lack expertise, find someone who has it. Ten minutes with a subject matter expert can save you from catastrophic errors.

Protocol 4: The Timeline Check

For any information that could change over time — leadership positions, laws, statistics, market conditions — verify when this information was accurate.

What to do: Create a simple checklist. Before using AI output for anything important, run through these four protocols. It takes minutes and prevents disasters.

Layer 3: Contextual Judgment (Catch Before Consequence)

This is the highest level of BS detection — using human judgment to evaluate whether the AI’s answer actually makes sense for your specific situation.

Judgment 1: The “So What” Test

AI gave you an answer. So what? Does this actually solve your problem, or just provide information? Sometimes AI gives you accurate but useless information because it didn’t understand what you actually needed.

Judgment 2: The Unintended Consequences Check

Assume AI is technically correct. What could go wrong if you follow its advice? What might it be missing about your organizational culture, relationships, or constraints?

AI doesn’t understand office politics, customer relationships, or your company’s history. It can suggest technically optimal solutions that are organizationally disastrous.

Judgment 3: The Ethics Filter

Does this recommendation align with your values and standards? AI can suggest efficient approaches that are ethically problematic. It optimizes for the goal you stated, not for broader ethical considerations.

Judgment 4: The Gut Check

After all the logical analysis, what does your instinct say? If something feels off, even if you can’t articulate why, investigate further. Your subconscious might be catching patterns your conscious mind hasn’t processed yet.

What to do: Never skip this layer, even if the first three layers passed. This is where your irreplaceable human judgment comes in.

Building the Habit: Your 30-Day BS Detection Training

Reading about BS detection doesn’t make you good at it. You need to practice. Here’s a structured 30-day program to build the skill.

Week 1: Deliberate Skepticism

For every AI interaction this week, assume the first answer is wrong. Force yourself to verify at least one claim, even for low-stakes questions. You’re building the verification reflex.

Week 2: Pattern Recognition

Keep a log of errors you catch. What type were they? Hallucination? Outdated info? Context collapse? You’re learning what kinds of mistakes this specific AI tool tends to make.

Week 3: Speed Verification

Practice verifying faster. How quickly can you check a source? Find contradictory information? Consult an expert? You’re building efficiency into your verification process.

Week 4: Judgment Development

Focus on the third layer. For every AI answer, ask yourself: “Even if this is accurate, is it right for my situation?” You’re developing contextual judgment that AI can’t provide.

By day 30, verification should feel automatic. You’ll catch yourself questioning AI outputs naturally, without conscious effort.

The Collaboration Mindset

Here’s the crucial shift: This isn’t about distrusting AI. It’s about collaborating with it appropriately.

AI is a tool that requires supervision. Your BS detection system is that supervision.

The people who thrive with AI aren’t the ones who trust it most or distrust it most. They’re the ones who verify strategically, collaborate thoughtfully, and apply human judgment where it matters.

Your credibility depends on it. Your effectiveness requires it. Your career may hinge on it.

AI is powerful. But only when paired with humans who know how to catch its mistakes.

← More Articles