Definition

AI hallucination is when a language model generates information that is factually incorrect, fabricated, or not grounded in the provided source material — but presents it with the same confident, fluent tone as accurate information. Hallucination is an intrinsic property of current LLMs, not a bug that can be fully fixed, but it can be significantly reduced with the right techniques.

Why hallucination happens — the real explanation

LLMs are not knowledge retrieval systems — they are learned probability distributions over token sequences. At every generation step, the model computes:

The model picks the next token by sampling from this probability distribution. h_t is the hidden state; W_U is the unembedding matrix; T is temperature. There is no "fact-check" step — only statistical plausibility.

When asked about a low-probability or out-of-distribution topic (rare statistics, obscure papers, recent events), the model generates what a plausible response would look like given its training distribution — not what is actually true. The model has no epistemic awareness: it cannot distinguish 'I know this' from 'this sounds like what the answer would look like'.

The confident wrong answer

Hallucinations are worst when the question is well-posed (the model "knows" the format of a correct answer) but the specific content is outside the training distribution. A model confidently fabricating an APA citation looks exactly like a real citation — this is why hallucination is especially dangerous for academic and professional use.

Types of hallucination

Type	Description	Example	Danger level
Factual hallucination	Wrong dates, statistics, names	"Einstein won the Nobel in 1925" (actually 1921)	Medium — verifiable
Citation hallucination	Invented papers with real-sounding metadata	"Smith et al. (2019), Nature, p.42" — paper doesn't exist	High — hard to detect without library access
Contextual hallucination	Contradicts information given in the prompt	Document says "Q3 revenue was $5M"; model says $8M	High — trust-breaking
Confabulation	Internally consistent but entirely fabricated story	Detailed biography of a person who doesn't exist	Very high — very convincing
Action hallucination	Claims to have done something it didn't do	"I searched the web and found..." (no tool was called)	Medium — workflow-breaking
Package hallucination	Invents library functions/APIs that don't exist	import pandas as pd; pd.read_json_fuzzy()	Medium — breaks code

Phantom imports in code generation

Studies have found that up to 20% of LLM-generated Python packages in code completions are hallucinated — the package name looks plausible but doesn't exist on PyPI. Always run pip install and unit tests before deploying AI-generated code.

Hallucination rates by model and task

Task type	Hallucination risk	Best mitigation
Simple factual Q&A on famous topics	Low (frontier models)	None needed for well-known facts
Specific citations / paper references	Very High (all models)	Always use RAG or scholarly databases
Medical / legal specific claims	High — dangerous	RAG + human expert review required
Code generation (popular libraries)	Low–Medium	Run tests; check API signatures
Code generation (niche/new libraries)	High (phantom imports)	Always verify against official docs
Recent events (post-cutoff)	Very High	Enable web search tools
Mathematical proofs	Medium (subtle errors)	Verify with CAS or formal checker

RAG dramatically reduces hallucination for document-grounded tasks. Studies show hallucination rates dropping from ~30% for open-ended GPT-4 to ~5% when the model is provided source documents and asked to cite them. However, models can still hallucinate by misquoting or contradicting provided documents (contextual hallucination), especially in long contexts.

How to detect and reduce hallucination

No foolproof hallucination detector exists — but several effective strategies reduce both occurrence and impact:

RAG (Retrieval-Augmented Generation) — ground every answer in source documents retrieved at query time. Force citation of specific pages/chunks. If the answer isn't in the retrieved context, the model should say so.
Chain-of-thought prompting — asking models to reason step-by-step reduces hallucination by exposing the reasoning chain where inconsistencies become visible.
Consistency sampling — generate k responses independently and look for agreement. If 4/5 answers agree, confidence is higher. Divergent answers flag uncertainty.
Tool use — give models access to a calculator, code interpreter, or web search to verify claims externally rather than relying on parametric memory.
Confidence calibration — prompt the model to explicitly rate its confidence and to say "I don't know" when uncertain. Well-calibrated models (Claude, GPT-4) do this reasonably.

Self-consistency decoding to detect hallucinations

import anthropic
import re
from collections import Counter

client = anthropic.Anthropic()

def self_consistency_check(question: str, n_samples: int = 5) -> dict:
    """
    Generate n independent answers and measure agreement.
    High variance across answers signals uncertain / hallucination-prone territory.
    """
    answers = []
    for _ in range(n_samples):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=200,
            temperature=0.8,   # non-zero to get variation
            messages=[{"role": "user", "content": question}]
        )
        answers.append(response.content[0].text.strip())

    # Rough agreement metric: count unique answers
    unique = len(set(answers))
    agreement = 1 - (unique - 1) / max(n_samples - 1, 1)

    return {
        "answers": answers,
        "agreement_score": round(agreement, 2),
        "confidence": "high" if agreement > 0.8 else "uncertain",
        "note": "Low agreement → model is uncertain → verify externally"
    }

result = self_consistency_check(
    "What was Claude Shannon's exact PhD thesis title?",
    n_samples=5
)
print(f"Agreement: {result['agreement_score']} — {result['confidence']}")

Why it's especially dangerous for students

Students face a unique hallucination risk: they may lack the domain expertise to recognize when an AI's answer is wrong. The AI's confident, well-formatted output can be indistinguishable from accurate information — even to instructors.

Scenario	Risk	Consequence
Citing AI-generated references	Very High	Academic misconduct + failed assignment
Medical question (diagnosis/dosage)	Critical	Direct patient harm if acted upon
Legal question (case law)	High	Wrong legal strategy; citation of non-existent case law
Math problem solving	Medium	Plausible-looking but wrong derivation
Historical dates / attributions	Medium	Wrong facts in essays or exam answers

Document-grounded AI is safer

Tools that retrieve answers from your actual uploaded textbook and cite the exact page number — like RAG-based study assistants — are dramatically safer for academic work than open-ended chat. The model cannot make up a page reference from your specific edition of your specific textbook.

Practice questions

What are the three types of LLM hallucination and examples of each? (Answer: (1) Factual hallucination: stating false facts confidently — inventing citations, wrong dates, non-existent people. Example: 'The Battle of Hastings was in 1067' (correct: 1066). (2) Faithfulness hallucination: generating content not supported by provided source documents. Example: summarising a document and adding claims not in the original. (3) Reasoning hallucination: logical errors in chain-of-thought. Example: 'All dogs are mammals. Fido is a mammal, therefore Fido is a dog.' Factual and faithfulness hallucinations are most studied; reasoning hallucinations are an active research area.)
What is the 'sycophancy' problem in LLMs and how does it relate to hallucination? (Answer: Sycophancy: LLMs agree with user beliefs even when those beliefs are factually incorrect. 'Einstein failed maths as a child' (false) — if the user states this confidently, sycophantic models agree rather than correcting. Sycophancy is a form of hallucination: the model generates false content to satisfy perceived user preferences rather than being accurate. Root cause: RLHF training on human preferences, where humans often preferred agreeable responses over correct-but-disagreeable ones. Mitigation: RLHF with honesty metrics, Constitutional AI, debate training.)
What is RAG (Retrieval-Augmented Generation) and how does it reduce hallucination? (Answer: RAG grounds LLM generation in retrieved documents: search a knowledge base for relevant passages, inject them into the context, instruct the model to answer only from provided sources. Reduces hallucination because: the model has explicit evidence to cite; the prompt can include 'answer only from the provided documents — say I don't know if not covered.' Remaining risks: faithfulness hallucination (misrepresenting what the document says), retrieval failures (relevant doc not retrieved), and models still hallucinate when instructed poorly.)
How do you evaluate hallucination in LLM outputs? (Answer: (1) FactScore (Min et al. 2023): decompose response into atomic claims, verify each claim against a knowledge source (Wikipedia). Report percentage of verifiable claims that are true. (2) RAGAS: evaluates RAG faithfulness — does the answer follow from the retrieved context? (3) TruthfulQA benchmark: tests model on questions where common misconceptions exist — does the model give the true or popular-but-false answer? (4) Human evaluation: domain experts check specific claims against authoritative sources. FactScore is the current standard for open-domain hallucination evaluation.)
What techniques reduce hallucination at inference time without retraining? (Answer: (1) Temperature=0: greedy decoding reduces creative fabrication. (2) Self-consistency: sample N responses, take majority vote — inconsistent claims are likely hallucinations. (3) Chain-of-thought: making the model reason step-by-step exposes errors before the final answer. (4) Cite-then-answer: require the model to quote specific sources before stating facts. (5) Uncertainty elicitation: prompt 'If unsure, say so' — trains output style to include appropriate hedging. (6) Knowledge boundary prompts: 'Only answer if you are certain this fact is in your training data.')

On LumiChats

LumiChats Study Mode virtually eliminates hallucination for document-based questions by using RAG: every answer is grounded in your uploaded textbook, the model is instructed to cite the page number, and it cannot draw on outside knowledge.

Try it free

Hallucination

Why hallucination happens — the real explanation

Types of hallucination

Hallucination rates by model and task

How to detect and reduce hallucination

Why it's especially dangerous for students

Practice questions

Try LumiChats for ₹69

Related Terms