AI hallucination is when a language model generates information that is factually incorrect, fabricated, or not grounded in the provided source material — but presents it with the same confident, fluent tone as accurate information. Hallucination is an intrinsic property of current LLMs, not a bug that can be fully fixed, but it can be significantly reduced with the right techniques.
Why hallucination happens — the real explanation
LLMs are not knowledge retrieval systems — they are learned probability distributions over token sequences. At every generation step, the model computes:
The model picks the next token by sampling from this probability distribution. h_t is the hidden state; W_U is the unembedding matrix; T is temperature. There is no "fact-check" step — only statistical plausibility.
When asked about a low-probability or out-of-distribution topic (rare statistics, obscure papers, recent events), the model generates what a plausible response would look like given its training distribution — not what is actually true. The model has no epistemic awareness: it cannot distinguish 'I know this' from 'this sounds like what the answer would look like'.
The confident wrong answer
Hallucinations are worst when the question is well-posed (the model "knows" the format of a correct answer) but the specific content is outside the training distribution. A model confidently fabricating an APA citation looks exactly like a real citation — this is why hallucination is especially dangerous for academic and professional use.
Types of hallucination
| Type | Description | Example | Danger level |
|---|---|---|---|
| Factual hallucination | Wrong dates, statistics, names | "Einstein won the Nobel in 1925" (actually 1921) | Medium — verifiable |
| Citation hallucination | Invented papers with real-sounding metadata | "Smith et al. (2019), Nature, p.42" — paper doesn't exist | High — hard to detect without library access |
| Contextual hallucination | Contradicts information given in the prompt | Document says "Q3 revenue was $5M"; model says $8M | High — trust-breaking |
| Confabulation | Internally consistent but entirely fabricated story | Detailed biography of a person who doesn't exist | Very high — very convincing |
| Action hallucination | Claims to have done something it didn't do | "I searched the web and found..." (no tool was called) | Medium — workflow-breaking |
| Package hallucination | Invents library functions/APIs that don't exist | import pandas as pd; pd.read_json_fuzzy() | Medium — breaks code |
Phantom imports in code generation
Studies have found that up to 20% of LLM-generated Python packages in code completions are hallucinated — the package name looks plausible but doesn't exist on PyPI. Always run pip install and unit tests before deploying AI-generated code.
Hallucination rates by model and task
| Task type | Hallucination risk | Best mitigation |
|---|---|---|
| Simple factual Q&A on famous topics | Low (frontier models) | None needed for well-known facts |
| Specific citations / paper references | Very High (all models) | Always use RAG or scholarly databases |
| Medical / legal specific claims | High — dangerous | RAG + human expert review required |
| Code generation (popular libraries) | Low–Medium | Run tests; check API signatures |
| Code generation (niche/new libraries) | High (phantom imports) | Always verify against official docs |
| Recent events (post-cutoff) | Very High | Enable web search tools |
| Mathematical proofs | Medium (subtle errors) | Verify with CAS or formal checker |
RAG dramatically reduces hallucination for document-grounded tasks. Studies show hallucination rates dropping from ~30% for open-ended GPT-4 to ~5% when the model is provided source documents and asked to cite them. However, models can still hallucinate by misquoting or contradicting provided documents (contextual hallucination), especially in long contexts.
How to detect and reduce hallucination
No foolproof hallucination detector exists — but several effective strategies reduce both occurrence and impact:
- RAG (Retrieval-Augmented Generation) — ground every answer in source documents retrieved at query time. Force citation of specific pages/chunks. If the answer isn't in the retrieved context, the model should say so.
- Chain-of-thought prompting — asking models to reason step-by-step reduces hallucination by exposing the reasoning chain where inconsistencies become visible.
- Consistency sampling — generate k responses independently and look for agreement. If 4/5 answers agree, confidence is higher. Divergent answers flag uncertainty.
- Tool use — give models access to a calculator, code interpreter, or web search to verify claims externally rather than relying on parametric memory.
- Confidence calibration — prompt the model to explicitly rate its confidence and to say "I don't know" when uncertain. Well-calibrated models (Claude, GPT-4) do this reasonably.
Self-consistency decoding to detect hallucinations
import anthropic
import re
from collections import Counter
client = anthropic.Anthropic()
def self_consistency_check(question: str, n_samples: int = 5) -> dict:
"""
Generate n independent answers and measure agreement.
High variance across answers signals uncertain / hallucination-prone territory.
"""
answers = []
for _ in range(n_samples):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=200,
temperature=0.8, # non-zero to get variation
messages=[{"role": "user", "content": question}]
)
answers.append(response.content[0].text.strip())
# Rough agreement metric: count unique answers
unique = len(set(answers))
agreement = 1 - (unique - 1) / max(n_samples - 1, 1)
return {
"answers": answers,
"agreement_score": round(agreement, 2),
"confidence": "high" if agreement > 0.8 else "uncertain",
"note": "Low agreement → model is uncertain → verify externally"
}
result = self_consistency_check(
"What was Claude Shannon's exact PhD thesis title?",
n_samples=5
)
print(f"Agreement: {result['agreement_score']} — {result['confidence']}")Why it's especially dangerous for students
Students face a unique hallucination risk: they may lack the domain expertise to recognize when an AI's answer is wrong. The AI's confident, well-formatted output can be indistinguishable from accurate information — even to instructors.
| Scenario | Risk | Consequence |
|---|---|---|
| Citing AI-generated references | Very High | Academic misconduct + failed assignment |
| Medical question (diagnosis/dosage) | Critical | Direct patient harm if acted upon |
| Legal question (case law) | High | Wrong legal strategy; citation of non-existent case law |
| Math problem solving | Medium | Plausible-looking but wrong derivation |
| Historical dates / attributions | Medium | Wrong facts in essays or exam answers |
Document-grounded AI is safer
Tools that retrieve answers from your actual uploaded textbook and cite the exact page number — like RAG-based study assistants — are dramatically safer for academic work than open-ended chat. The model cannot make up a page reference from your specific edition of your specific textbook.
Practice questions
- What are the three types of LLM hallucination and examples of each? (Answer: (1) Factual hallucination: stating false facts confidently — inventing citations, wrong dates, non-existent people. Example: 'The Battle of Hastings was in 1067' (correct: 1066). (2) Faithfulness hallucination: generating content not supported by provided source documents. Example: summarising a document and adding claims not in the original. (3) Reasoning hallucination: logical errors in chain-of-thought. Example: 'All dogs are mammals. Fido is a mammal, therefore Fido is a dog.' Factual and faithfulness hallucinations are most studied; reasoning hallucinations are an active research area.)
- What is the 'sycophancy' problem in LLMs and how does it relate to hallucination? (Answer: Sycophancy: LLMs agree with user beliefs even when those beliefs are factually incorrect. 'Einstein failed maths as a child' (false) — if the user states this confidently, sycophantic models agree rather than correcting. Sycophancy is a form of hallucination: the model generates false content to satisfy perceived user preferences rather than being accurate. Root cause: RLHF training on human preferences, where humans often preferred agreeable responses over correct-but-disagreeable ones. Mitigation: RLHF with honesty metrics, Constitutional AI, debate training.)
- What is RAG (Retrieval-Augmented Generation) and how does it reduce hallucination? (Answer: RAG grounds LLM generation in retrieved documents: search a knowledge base for relevant passages, inject them into the context, instruct the model to answer only from provided sources. Reduces hallucination because: the model has explicit evidence to cite; the prompt can include 'answer only from the provided documents — say I don't know if not covered.' Remaining risks: faithfulness hallucination (misrepresenting what the document says), retrieval failures (relevant doc not retrieved), and models still hallucinate when instructed poorly.)
- How do you evaluate hallucination in LLM outputs? (Answer: (1) FactScore (Min et al. 2023): decompose response into atomic claims, verify each claim against a knowledge source (Wikipedia). Report percentage of verifiable claims that are true. (2) RAGAS: evaluates RAG faithfulness — does the answer follow from the retrieved context? (3) TruthfulQA benchmark: tests model on questions where common misconceptions exist — does the model give the true or popular-but-false answer? (4) Human evaluation: domain experts check specific claims against authoritative sources. FactScore is the current standard for open-domain hallucination evaluation.)
- What techniques reduce hallucination at inference time without retraining? (Answer: (1) Temperature=0: greedy decoding reduces creative fabrication. (2) Self-consistency: sample N responses, take majority vote — inconsistent claims are likely hallucinations. (3) Chain-of-thought: making the model reason step-by-step exposes errors before the final answer. (4) Cite-then-answer: require the model to quote specific sources before stating facts. (5) Uncertainty elicitation: prompt 'If unsure, say so' — trains output style to include appropriate hedging. (6) Knowledge boundary prompts: 'Only answer if you are certain this fact is in your training data.')
On LumiChats
LumiChats Study Mode virtually eliminates hallucination for document-based questions by using RAG: every answer is grounded in your uploaded textbook, the model is instructed to cite the page number, and it cannot draw on outside knowledge.
Try it free