Published January 28, 2025
4 min read
LLMs lie with confidence. They invent statistics, fabricate citations, and present fiction as fact—all while sounding completely authoritative. In production systems, this isn't just embarrassing; it's a liability.
Hallucination occurs when an LLM generates content that sounds plausible but is factually incorrect. Unlike human errors where uncertainty is often visible, LLM hallucinations come wrapped in the same confident tone as accurate responses.
# Real example: Ask an LLM about a fictional paper
prompt = "Summarize the findings from the 2023 Stanford study on prompt injection by Dr. Sarah Mitchell"
# The model might generate a detailed summary of a study that doesn't exist,
# complete with fake statistics and methodologyThe model invents facts, statistics, or claims that have no basis in reality:
The model conflates different people, companies, or events:
Perhaps the most dangerous type—the model generates fake sources:
# Example hallucinated citation
"""
According to Johnson et al. (2024), "Retrieval-Augmented Generation reduces
hallucination rates by 89% compared to base models."
Source: https://arxiv.org/abs/2401.12345
"""
# The paper doesn't exist. The URL returns 404. The statistic is fabricated.The model makes logical leaps that seem reasonable but are incorrect:
LLMs are trained on internet text that contains errors, outdated information, and contradictions. The model learns patterns, not truth—it can't distinguish accurate from inaccurate training data.
LLMs don't have genuine uncertainty. They assign probability distributions over tokens, but these don't map to epistemic confidence. A model will generate "The capital of Australia is Sydney" with the same fluency as "The capital of Australia is Canberra."
When the model lacks relevant context, it fills gaps with plausible-sounding content rather than admitting ignorance. RAG systems help but don't eliminate this problem—if retrieval fails, the model falls back to hallucination.
Compare generated claims against verified sources:
def verify_claim(claim: str, knowledge_base: VectorStore) -> dict:
"""Check if a claim is supported by the knowledge base."""
# Retrieve relevant documents
docs = knowledge_base.similarity_search(claim, k=5)
# Use LLM to check if docs support the claim
verification_prompt = f"""
Claim: {claim}
Supporting documents:
{docs}
Does the evidence support this claim?
Answer: SUPPORTED, CONTRADICTED, or INSUFFICIENT_EVIDENCE
Explanation:
"""
result = llm.generate(verification_prompt)
return parse_verification(result)For any URLs in the response, verify they actually exist:
import requests
from urllib.parse import urlparse
def validate_urls(text: str) -> list[dict]:
"""Extract and validate URLs from text."""
url_pattern = r'https?://[^\s<>"{}|\\^`\[\]]+'
urls = re.findall(url_pattern, text)
results = []
for url in urls:
try:
response = requests.head(url, timeout=5, allow_redirects=True)
results.append({
'url': url,
'valid': response.status_code == 200,
'status': response.status_code
})
except requests.RequestException:
results.append({
'url': url,
'valid': False,
'status': 'unreachable'
})
return resultsAsk the same question multiple ways and check for contradictions:
def check_self_consistency(question: str, n_samples: int = 5) -> dict:
"""Generate multiple responses and check for consistency."""
responses = []
for _ in range(n_samples):
# Use temperature > 0 for variation
response = llm.generate(question, temperature=0.7)
responses.append(response)
# Use LLM to analyze consistency
consistency_prompt = f"""
Question: {question}
Responses generated:
{responses}
Are these responses consistent with each other?
Identify any contradictions or inconsistencies.
"""
analysis = llm.generate(consistency_prompt)
return {
'responses': responses,
'analysis': analysis,
'consistent': 'contradiction' not in analysis.lower()
}Use a separate model to evaluate response quality:
def score_hallucination_risk(response: str, context: str) -> float:
"""Score the likelihood that a response contains hallucinations."""
judge_prompt = f"""
You are evaluating an LLM response for potential hallucinations.
Context provided to the LLM:
{context}
Response to evaluate:
{response}
Score the hallucination risk from 0.0 (definitely accurate) to 1.0 (likely hallucinated).
Consider:
- Are claims supported by the context?
- Are there specific statistics or citations that seem fabricated?
- Does the response make claims beyond what the context supports?
Return only a decimal number between 0.0 and 1.0.
"""
score = float(llm.generate(judge_prompt).strip())
return scoreRAG reduces hallucinations by providing factual context, but it's not a cure:
def grounded_generation(query: str, knowledge_base: VectorStore) -> str:
"""Generate response grounded in retrieved documents."""
# Retrieve relevant documents
docs = knowledge_base.similarity_search(query, k=3)
# Strict grounding prompt
prompt = f"""
Answer the following question using ONLY the information provided below.
If the information is not sufficient to answer, say "I don't have enough information."
Do not make up facts or statistics.
Documents:
{docs}
Question: {query}
"""
return llm.generate(prompt)Train the model to express uncertainty and filter low-confidence responses:
def generate_with_confidence(prompt: str, threshold: float = 0.7) -> dict:
"""Generate response with confidence scoring."""
response_prompt = f"""
{prompt}
After your response, rate your confidence from 0.0 to 1.0.
Format: [CONFIDENCE: X.X]
"""
output = llm.generate(response_prompt)
# Extract confidence score
confidence_match = re.search(r'\[CONFIDENCE:\s*([\d.]+)\]', output)
confidence = float(confidence_match.group(1)) if confidence_match else 0.5
if confidence < threshold:
return {
'response': "I'm not confident enough to answer this accurately.",
'confidence': confidence,
'filtered': True
}
return {
'response': output.replace(confidence_match.group(0), '').strip(),
'confidence': confidence,
'filtered': False
}For responses that should follow a specific format, validate the structure:
from pydantic import BaseModel, HttpUrl, validator
from typing import Optional
class FactualClaim(BaseModel):
claim: str
source_url: Optional[HttpUrl]
confidence: float
@validator('confidence')
def confidence_in_range(cls, v):
if not 0 <= v <= 1:
raise ValueError('Confidence must be between 0 and 1')
return v
def generate_structured(prompt: str) -> FactualClaim:
"""Generate response as validated structured data."""
response = llm.generate(prompt, response_format=FactualClaim)
# Additional validation: check if URL exists
if response.source_url:
url_valid = validate_url(str(response.source_url))
if not url_valid:
response.confidence *= 0.5 # Penalize unverifiable sources
return responseHallucination detection isn't about achieving perfection—it's about building systems that fail gracefully. The key principles:
The organizations that treat hallucination detection as a core reliability concern—not an afterthought—will build AI applications that users actually trust. In a market flooded with unreliable chatbots, that's a competitive advantage.