ai-reliability

hallucination

llm

testing

Detecting Hallucinations in Production LLMs

Practical techniques for identifying and preventing LLM hallucinations - fabricated facts, fake URLs, and invented citations.

Published November 18, 2025

4 min read

LLMs lie with confidence. They invent statistics, fabricate citations, and present fiction as fact—all while sounding completely authoritative. In production systems, this isn't just embarrassing; it's a liability.

The Hallucination Problem

Hallucination occurs when an LLM generates content that sounds plausible but is factually incorrect. Unlike human errors where uncertainty is often visible, LLM hallucinations come wrapped in the same confident tone as accurate responses.

# Real example: Ask an LLM about a fictional paper
prompt = "Summarize the findings from the 2023 Stanford study on prompt injection by Dr. Sarah Mitchell"
 
# The model might generate a detailed summary of a study that doesn't exist,
# complete with fake statistics and methodology

Types of Hallucinations

Factual Fabrication

The model invents facts, statistics, or claims that have no basis in reality:

"Studies show that 73% of enterprises experienced prompt injection attacks in 2024"
"The GPT-4 architecture uses 1.8 trillion parameters"
"OpenAI's guidelines recommend using temperature 0.7 for all production applications"

Entity Confusion

The model conflates different people, companies, or events:

Attributing quotes to the wrong person
Mixing up company acquisitions or product launches
Confusing historical events or dates

Citation Fabrication

Perhaps the most dangerous type—the model generates fake sources:

# Example hallucinated citation
"""
According to Johnson et al. (2024), "Retrieval-Augmented Generation reduces
hallucination rates by 89% compared to base models."
Source: https://arxiv.org/abs/2401.12345
"""
 
# The paper doesn't exist. The URL returns 404. The statistic is fabricated.

Extrapolation Errors

The model makes logical leaps that seem reasonable but are incorrect:

Inferring causation from correlation
Extending patterns beyond their valid range
Drawing conclusions that don't follow from the premises

Why Hallucinations Happen

Training Data Limitations

LLMs are trained on internet text that contains errors, outdated information, and contradictions. The model learns patterns, not truth—it can't distinguish accurate from inaccurate training data.

Confidence Miscalibration

LLMs don't have genuine uncertainty. They assign probability distributions over tokens, but these don't map to epistemic confidence. A model will generate "The capital of Australia is Sydney" with the same fluency as "The capital of Australia is Canberra."

Context Window Limitations

When the model lacks relevant context, it fills gaps with plausible-sounding content rather than admitting ignorance. RAG systems help but don't eliminate this problem—if retrieval fails, the model falls back to hallucination.

Detection Techniques

Fact-Checking Against Knowledge Bases

Compare generated claims against verified sources:

def verify_claim(claim: str, knowledge_base: VectorStore) -> dict:
    """Check if a claim is supported by the knowledge base."""
    # Retrieve relevant documents
    docs = knowledge_base.similarity_search(claim, k=5)
 
    # Use LLM to check if docs support the claim
    verification_prompt = f"""
    Claim: {claim}
 
    Supporting documents:
    {docs}
 
    Does the evidence support this claim?
    Answer: SUPPORTED, CONTRADICTED, or INSUFFICIENT_EVIDENCE
    Explanation:
    """
 
    result = llm.generate(verification_prompt)
    return parse_verification(result)

URL Validation

For any URLs in the response, verify they actually exist:

import requests
from urllib.parse import urlparse
 
def validate_urls(text: str) -> list[dict]:
    """Extract and validate URLs from text."""
    url_pattern = r'https?://[^\s<>"{}|\\^`\[\]]+'
    urls = re.findall(url_pattern, text)
 
    results = []
    for url in urls:
        try:
            response = requests.head(url, timeout=5, allow_redirects=True)
            results.append({
                'url': url,
                'valid': response.status_code == 200,
                'status': response.status_code
            })
        except requests.RequestException:
            results.append({
                'url': url,
                'valid': False,
                'status': 'unreachable'
            })
 
    return results

Self-Consistency Checking

Ask the same question multiple ways and check for contradictions:

def check_self_consistency(question: str, n_samples: int = 5) -> dict:
    """Generate multiple responses and check for consistency."""
    responses = []
    for _ in range(n_samples):
        # Use temperature > 0 for variation
        response = llm.generate(question, temperature=0.7)
        responses.append(response)
 
    # Use LLM to analyze consistency
    consistency_prompt = f"""
    Question: {question}
 
    Responses generated:
    {responses}
 
    Are these responses consistent with each other?
    Identify any contradictions or inconsistencies.
    """
 
    analysis = llm.generate(consistency_prompt)
    return {
        'responses': responses,
        'analysis': analysis,
        'consistent': 'contradiction' not in analysis.lower()
    }

LLM-as-Judge Scoring

Use a separate model to evaluate response quality:

def score_hallucination_risk(response: str, context: str) -> float:
    """Score the likelihood that a response contains hallucinations."""
    judge_prompt = f"""
    You are evaluating an LLM response for potential hallucinations.
 
    Context provided to the LLM:
    {context}
 
    Response to evaluate:
    {response}
 
    Score the hallucination risk from 0.0 (definitely accurate) to 1.0 (likely hallucinated).
    Consider:
    - Are claims supported by the context?
    - Are there specific statistics or citations that seem fabricated?
    - Does the response make claims beyond what the context supports?
 
    Return only a decimal number between 0.0 and 1.0.
    """
 
    score = float(llm.generate(judge_prompt).strip())
    return score

Prevention Strategies

Ground with RAG (Carefully)

RAG reduces hallucinations by providing factual context, but it's not a cure:

def grounded_generation(query: str, knowledge_base: VectorStore) -> str:
    """Generate response grounded in retrieved documents."""
    # Retrieve relevant documents
    docs = knowledge_base.similarity_search(query, k=3)
 
    # Strict grounding prompt
    prompt = f"""
    Answer the following question using ONLY the information provided below.
    If the information is not sufficient to answer, say "I don't have enough information."
    Do not make up facts or statistics.
 
    Documents:
    {docs}
 
    Question: {query}
    """
 
    return llm.generate(prompt)

Implement Confidence Thresholds

Train the model to express uncertainty and filter low-confidence responses:

def generate_with_confidence(prompt: str, threshold: float = 0.7) -> dict:
    """Generate response with confidence scoring."""
    response_prompt = f"""
    {prompt}
 
    After your response, rate your confidence from 0.0 to 1.0.
    Format: [CONFIDENCE: X.X]
    """
 
    output = llm.generate(response_prompt)
 
    # Extract confidence score
    confidence_match = re.search(r'\[CONFIDENCE:\s*([\d.]+)\]', output)
    confidence = float(confidence_match.group(1)) if confidence_match else 0.5
 
    if confidence < threshold:
        return {
            'response': "I'm not confident enough to answer this accurately.",
            'confidence': confidence,
            'filtered': True
        }
 
    return {
        'response': output.replace(confidence_match.group(0), '').strip(),
        'confidence': confidence,
        'filtered': False
    }

Structured Output Validation

For responses that should follow a specific format, validate the structure:

from pydantic import BaseModel, HttpUrl, validator
from typing import Optional
 
class FactualClaim(BaseModel):
    claim: str
    source_url: Optional[HttpUrl]
    confidence: float
 
    @validator('confidence')
    def confidence_in_range(cls, v):
        if not 0 <= v <= 1:
            raise ValueError('Confidence must be between 0 and 1')
        return v
 
def generate_structured(prompt: str) -> FactualClaim:
    """Generate response as validated structured data."""
    response = llm.generate(prompt, response_format=FactualClaim)
 
    # Additional validation: check if URL exists
    if response.source_url:
        url_valid = validate_url(str(response.source_url))
        if not url_valid:
            response.confidence *= 0.5  # Penalize unverifiable sources
 
    return response

Conclusion

Hallucination detection isn't about achieving perfection—it's about building systems that fail gracefully. The key principles:

Verify everything verifiable - URLs, citations, statistics can all be checked
Embrace uncertainty - Train your system to say "I don't know"
Layer your defenses - Self-consistency, grounding, and LLM-as-judge together
Monitor in production - Hallucination patterns change as models and prompts evolve

The organizations that treat hallucination detection as a core reliability concern—not an afterthought—will build AI applications that users actually trust. In a market flooded with unreliable chatbots, that's a competitive advantage.