Glossary/Hallucination
AI Fundamentals

Hallucination

When AI confidently says something wrong.


Definition

AI hallucination is when a language model generates information that is factually incorrect, fabricated, or not grounded in the provided source material — but presents it with the same confident, fluent tone as accurate information. Hallucination is an intrinsic property of current LLMs, not a bug that can be fully fixed, but it can be significantly reduced with the right techniques.

Why hallucination happens — the real explanation

LLMs are not knowledge retrieval systems — they are learned probability distributions over token sequences. At every generation step, the model computes:

The model picks the next token by sampling from this probability distribution. h_t is the hidden state; W_U is the unembedding matrix; T is temperature. There is no "fact-check" step — only statistical plausibility.

When asked about a low-probability or out-of-distribution topic (rare statistics, obscure papers, recent events), the model generates what a plausible response would look like given its training distribution — not what is actually true. The model has no epistemic awareness: it cannot distinguish 'I know this' from 'this sounds like what the answer would look like'.

The confident wrong answer

Hallucinations are worst when the question is well-posed (the model "knows" the format of a correct answer) but the specific content is outside the training distribution. A model confidently fabricating an APA citation looks exactly like a real citation — this is why hallucination is especially dangerous for academic and professional use.

Types of hallucination

TypeDescriptionExampleDanger level
Factual hallucinationWrong dates, statistics, names"Einstein won the Nobel in 1925" (actually 1921)Medium — verifiable
Citation hallucinationInvented papers with real-sounding metadata"Smith et al. (2019), Nature, p.42" — paper doesn't existHigh — hard to detect without library access
Contextual hallucinationContradicts information given in the promptDocument says "Q3 revenue was $5M"; model says $8MHigh — trust-breaking
ConfabulationInternally consistent but entirely fabricated storyDetailed biography of a person who doesn't existVery high — very convincing
Action hallucinationClaims to have done something it didn't do"I searched the web and found..." (no tool was called)Medium — workflow-breaking
Package hallucinationInvents library functions/APIs that don't existimport pandas as pd; pd.read_json_fuzzy()Medium — breaks code

Phantom imports in code generation

Studies have found that up to 20% of LLM-generated Python packages in code completions are hallucinated — the package name looks plausible but doesn't exist on PyPI. Always run pip install and unit tests before deploying AI-generated code.

Hallucination rates by model and task

Task typeHallucination riskBest mitigation
Simple factual Q&A on famous topicsLow (frontier models)None needed for well-known facts
Specific citations / paper referencesVery High (all models)Always use RAG or scholarly databases
Medical / legal specific claimsHigh — dangerousRAG + human expert review required
Code generation (popular libraries)Low–MediumRun tests; check API signatures
Code generation (niche/new libraries)High (phantom imports)Always verify against official docs
Recent events (post-cutoff)Very HighEnable web search tools
Mathematical proofsMedium (subtle errors)Verify with CAS or formal checker

RAG dramatically reduces hallucination for document-grounded tasks. Studies show hallucination rates dropping from ~30% for open-ended GPT-4 to ~5% when the model is provided source documents and asked to cite them. However, models can still hallucinate by misquoting or contradicting provided documents (contextual hallucination), especially in long contexts.

How to detect and reduce hallucination

No foolproof hallucination detector exists — but several effective strategies reduce both occurrence and impact:

  • RAG (Retrieval-Augmented Generation) — ground every answer in source documents retrieved at query time. Force citation of specific pages/chunks. If the answer isn't in the retrieved context, the model should say so.
  • Chain-of-thought prompting — asking models to reason step-by-step reduces hallucination by exposing the reasoning chain where inconsistencies become visible.
  • Consistency sampling — generate k responses independently and look for agreement. If 4/5 answers agree, confidence is higher. Divergent answers flag uncertainty.
  • Tool use — give models access to a calculator, code interpreter, or web search to verify claims externally rather than relying on parametric memory.
  • Confidence calibration — prompt the model to explicitly rate its confidence and to say "I don't know" when uncertain. Well-calibrated models (Claude, GPT-4) do this reasonably.

Self-consistency decoding to detect hallucinations

import anthropic
import re
from collections import Counter

client = anthropic.Anthropic()

def self_consistency_check(question: str, n_samples: int = 5) -> dict:
    """
    Generate n independent answers and measure agreement.
    High variance across answers signals uncertain / hallucination-prone territory.
    """
    answers = []
    for _ in range(n_samples):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=200,
            temperature=0.8,   # non-zero to get variation
            messages=[{"role": "user", "content": question}]
        )
        answers.append(response.content[0].text.strip())

    # Rough agreement metric: count unique answers
    unique = len(set(answers))
    agreement = 1 - (unique - 1) / max(n_samples - 1, 1)

    return {
        "answers": answers,
        "agreement_score": round(agreement, 2),
        "confidence": "high" if agreement > 0.8 else "uncertain",
        "note": "Low agreement → model is uncertain → verify externally"
    }

result = self_consistency_check(
    "What was Claude Shannon's exact PhD thesis title?",
    n_samples=5
)
print(f"Agreement: {result['agreement_score']} — {result['confidence']}")

Why it's especially dangerous for students

Students face a unique hallucination risk: they may lack the domain expertise to recognize when an AI's answer is wrong. The AI's confident, well-formatted output can be indistinguishable from accurate information — even to instructors.

ScenarioRiskConsequence
Citing AI-generated referencesVery HighAcademic misconduct + failed assignment
Medical question (diagnosis/dosage)CriticalDirect patient harm if acted upon
Legal question (case law)HighWrong legal strategy; citation of non-existent case law
Math problem solvingMediumPlausible-looking but wrong derivation
Historical dates / attributionsMediumWrong facts in essays or exam answers

Document-grounded AI is safer

Tools that retrieve answers from your actual uploaded textbook and cite the exact page number — like RAG-based study assistants — are dramatically safer for academic work than open-ended chat. The model cannot make up a page reference from your specific edition of your specific textbook.

Practice questions

  1. What are the three types of LLM hallucination and examples of each? (Answer: (1) Factual hallucination: stating false facts confidently — inventing citations, wrong dates, non-existent people. Example: 'The Battle of Hastings was in 1067' (correct: 1066). (2) Faithfulness hallucination: generating content not supported by provided source documents. Example: summarising a document and adding claims not in the original. (3) Reasoning hallucination: logical errors in chain-of-thought. Example: 'All dogs are mammals. Fido is a mammal, therefore Fido is a dog.' Factual and faithfulness hallucinations are most studied; reasoning hallucinations are an active research area.)
  2. What is the 'sycophancy' problem in LLMs and how does it relate to hallucination? (Answer: Sycophancy: LLMs agree with user beliefs even when those beliefs are factually incorrect. 'Einstein failed maths as a child' (false) — if the user states this confidently, sycophantic models agree rather than correcting. Sycophancy is a form of hallucination: the model generates false content to satisfy perceived user preferences rather than being accurate. Root cause: RLHF training on human preferences, where humans often preferred agreeable responses over correct-but-disagreeable ones. Mitigation: RLHF with honesty metrics, Constitutional AI, debate training.)
  3. What is RAG (Retrieval-Augmented Generation) and how does it reduce hallucination? (Answer: RAG grounds LLM generation in retrieved documents: search a knowledge base for relevant passages, inject them into the context, instruct the model to answer only from provided sources. Reduces hallucination because: the model has explicit evidence to cite; the prompt can include 'answer only from the provided documents — say I don't know if not covered.' Remaining risks: faithfulness hallucination (misrepresenting what the document says), retrieval failures (relevant doc not retrieved), and models still hallucinate when instructed poorly.)
  4. How do you evaluate hallucination in LLM outputs? (Answer: (1) FactScore (Min et al. 2023): decompose response into atomic claims, verify each claim against a knowledge source (Wikipedia). Report percentage of verifiable claims that are true. (2) RAGAS: evaluates RAG faithfulness — does the answer follow from the retrieved context? (3) TruthfulQA benchmark: tests model on questions where common misconceptions exist — does the model give the true or popular-but-false answer? (4) Human evaluation: domain experts check specific claims against authoritative sources. FactScore is the current standard for open-domain hallucination evaluation.)
  5. What techniques reduce hallucination at inference time without retraining? (Answer: (1) Temperature=0: greedy decoding reduces creative fabrication. (2) Self-consistency: sample N responses, take majority vote — inconsistent claims are likely hallucinations. (3) Chain-of-thought: making the model reason step-by-step exposes errors before the final answer. (4) Cite-then-answer: require the model to quote specific sources before stating facts. (5) Uncertainty elicitation: prompt 'If unsure, say so' — trains output style to include appropriate hedging. (6) Knowledge boundary prompts: 'Only answer if you are certain this fact is in your training data.')

On LumiChats

LumiChats Study Mode virtually eliminates hallucination for document-based questions by using RAG: every answer is grounded in your uploaded textbook, the model is instructed to cite the page number, and it cannot draw on outside knowledge.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

5 terms