Detecting Hallucinations in Production LLMs
Practical techniques for identifying and preventing LLM hallucinations - fabricated facts, fake URLs, and invented citations.
LLMs lie with confidence. They invent statistics, fabricate citations, and present fiction as fact—all while sounding completely authoritative. In production systems, this isn't just embarrassing; it's a liability.
The Hallucination Problem
Hallucination occurs when an LLM generates content that sounds plausible but is factually incorrect. Unlike human errors where uncertainty is often visible, LLM hallucinations come wrapped in the same confident tone as accurate responses.
# Real example: Ask an LLM about a fictional paper
prompt = "Summarize the findings from the 2023 Stanford study on prompt injection by Dr. Sarah Mitchell"
# The model might generate a detailed summary of a study that doesn't exist,
# complete with fake statistics and methodologyTypes of Hallucinations
Factual Fabrication
The model invents facts, statistics, or claims that have no basis in reality:
- "Studies show that 73% of enterprises experienced prompt injection attacks in 2024"
- "The GPT-4 architecture uses 1.8 trillion parameters"
- "OpenAI's guidelines recommend using temperature 0.7 for all production applications"
Entity Confusion
The model conflates different people, companies, or events:
- Attributing quotes to the wrong person
- Mixing up company acquisitions or product launches
- Confusing historical events or dates
Citation Fabrication
Perhaps the most dangerous type—the model generates fake sources:
# Example hallucinated citation
"""
According to Johnson et al. (2024), "Retrieval-Augmented Generation reduces
hallucination rates by 89% compared to base models."
Source: https://arxiv.org/abs/2401.12345
"""
# The paper doesn't exist. The URL returns 404. The statistic is fabricated.Extrapolation Errors
The model makes logical leaps that seem reasonable but are incorrect:
- Inferring causation from correlation
- Extending patterns beyond their valid range
- Drawing conclusions that don't follow from the premises
Why Hallucinations Happen
Training Data Limitations
LLMs are trained on internet text that contains errors, outdated information, and contradictions. The model learns patterns, not truth—it can't distinguish accurate from inaccurate training data.
Confidence Miscalibration
LLMs don't have genuine uncertainty. They assign probability distributions over tokens, but these don't map to epistemic confidence. A model will generate "The capital of Australia is Sydney" with the same fluency as "The capital of Australia is Canberra."
Context Window Limitations
When the model lacks relevant context, it fills gaps with plausible-sounding content rather than admitting ignorance. RAG systems help but don't eliminate this problem—if retrieval fails, the model falls back to hallucination.
Detection Techniques
Fact-Checking Against Knowledge Bases
Compare generated claims against verified sources:
def verify_claim(claim: str, knowledge_base: VectorStore) -> dict:
"""Check if a claim is supported by the knowledge base."""
# Retrieve relevant documents
docs = knowledge_base.similarity_search(claim, k=5)
# Use LLM to check if docs support the claim
verification_prompt = f"""
Claim: {claim}
Supporting documents:
{docs}
Does the evidence support this claim?
Answer: SUPPORTED, CONTRADICTED, or INSUFFICIENT_EVIDENCE
Explanation:
"""
result = llm.generate(verification_prompt)
return parse_verification(result)URL Validation
For any URLs in the response, verify they actually exist:
import requests
from urllib.parse import urlparse
def validate_urls(text: str) -> list[dict]:
"""Extract and validate URLs from text."""
url_pattern = r'https?://[^\s<>"{}|\\^`\[\]]+'
urls = re.findall(url_pattern, text)
results = []
for url in urls:
try:
response = requests.head(url, timeout=5, allow_redirects=True)
results.append({
'url': url,
'valid': response.status_code == 200,
'status': response.status_code
})
except requests.RequestException:
results.append({
'url': url,
'valid': False,
'status': 'unreachable'
})
return resultsSelf-Consistency Checking
Ask the same question multiple ways and check for contradictions:
def check_self_consistency(question: str, n_samples: int = 5) -> dict:
"""Generate multiple responses and check for consistency."""
responses = []
for _ in range(n_samples):
# Use temperature > 0 for variation
response = llm.generate(question, temperature=0.7)
responses.append(response)
# Use LLM to analyze consistency
consistency_prompt = f"""
Question: {question}
Responses generated:
{responses}
Are these responses consistent with each other?
Identify any contradictions or inconsistencies.
"""
analysis = llm.generate(consistency_prompt)
return {
'responses': responses,
'analysis': analysis,
'consistent': 'contradiction' not in analysis.lower()
}LLM-as-Judge Scoring
Use a separate model to evaluate response quality:
def score_hallucination_risk(response: str, context: str) -> float:
"""Score the likelihood that a response contains hallucinations."""
judge_prompt = f"""
You are evaluating an LLM response for potential hallucinations.
Context provided to the LLM:
{context}
Response to evaluate:
{response}
Score the hallucination risk from 0.0 (definitely accurate) to 1.0 (likely hallucinated).
Consider:
- Are claims supported by the context?
- Are there specific statistics or citations that seem fabricated?
- Does the response make claims beyond what the context supports?
Return only a decimal number between 0.0 and 1.0.
"""
score = float(llm.generate(judge_prompt).strip())
return scorePrevention Strategies
Ground with RAG (Carefully)
RAG reduces hallucinations by providing factual context, but it's not a cure:
def grounded_generation(query: str, knowledge_base: VectorStore) -> str:
"""Generate response grounded in retrieved documents."""
# Retrieve relevant documents
docs = knowledge_base.similarity_search(query, k=3)
# Strict grounding prompt
prompt = f"""
Answer the following question using ONLY the information provided below.
If the information is not sufficient to answer, say "I don't have enough information."
Do not make up facts or statistics.
Documents:
{docs}
Question: {query}
"""
return llm.generate(prompt)Implement Confidence Thresholds
Train the model to express uncertainty and filter low-confidence responses:
def generate_with_confidence(prompt: str, threshold: float = 0.7) -> dict:
"""Generate response with confidence scoring."""
response_prompt = f"""
{prompt}
After your response, rate your confidence from 0.0 to 1.0.
Format: [CONFIDENCE: X.X]
"""
output = llm.generate(response_prompt)
# Extract confidence score
confidence_match = re.search(r'\[CONFIDENCE:\s*([\d.]+)\]', output)
confidence = float(confidence_match.group(1)) if confidence_match else 0.5
if confidence < threshold:
return {
'response': "I'm not confident enough to answer this accurately.",
'confidence': confidence,
'filtered': True
}
return {
'response': output.replace(confidence_match.group(0), '').strip(),
'confidence': confidence,
'filtered': False
}Structured Output Validation
For responses that should follow a specific format, validate the structure:
from pydantic import BaseModel, HttpUrl, validator
from typing import Optional
class FactualClaim(BaseModel):
claim: str
source_url: Optional[HttpUrl]
confidence: float
@validator('confidence')
def confidence_in_range(cls, v):
if not 0 <= v <= 1:
raise ValueError('Confidence must be between 0 and 1')
return v
def generate_structured(prompt: str) -> FactualClaim:
"""Generate response as validated structured data."""
response = llm.generate(prompt, response_format=FactualClaim)
# Additional validation: check if URL exists
if response.source_url:
url_valid = validate_url(str(response.source_url))
if not url_valid:
response.confidence *= 0.5 # Penalize unverifiable sources
return responseConclusion
Hallucination detection isn't about achieving perfection—it's about building systems that fail gracefully. The key principles:
- Verify everything verifiable - URLs, citations, statistics can all be checked
- Embrace uncertainty - Train your system to say "I don't know"
- Layer your defenses - Self-consistency, grounding, and LLM-as-judge together
- Monitor in production - Hallucination patterns change as models and prompts evolve
The organizations that treat hallucination detection as a core reliability concern—not an afterthought—will build AI applications that users actually trust. In a market flooded with unreliable chatbots, that's a competitive advantage.