Named Entity Recognition (NER) is an NLP task that identifies named entities in text — proper nouns referring to specific people, organisations, locations, dates, monetary values, or other domain-specific entities — and classifies them into predefined categories. NER is one of the oldest and most widely deployed NLP tasks, forming the backbone of information extraction systems in healthcare (extracting diagnoses and medications), finance (extracting companies and financial figures), journalism, and enterprise search.
Standard NER entity categories
| Category | Examples | Common use case |
|---|---|---|
| PERSON | Aditya Kumar Jha, Sam Altman, Dario Amodei | People mentioned in news, contracts, medical records |
| ORGANISATION | Anthropic, LumiChats, IIT Delhi, RBI | Company extraction, institutional mentions |
| LOCATION / GPE | Mumbai, India, Silicon Valley | Geographic analysis, travel, logistics |
| DATE / TIME | 21 March 2026, next Tuesday, Q3 2026 | Timeline extraction, scheduling, event detection |
| MONEY | ₹69/day, $200 million, €1.2 billion | Financial document analysis, contract extraction |
| PRODUCT | Claude Sonnet 4.6, iPhone 16, GPT-5.4 | Product mention tracking, competitive intelligence |
| EVENT | JEE Advanced 2026, GTC 2026, IPL 2026 | Event detection, sports, conferences |
NER with spaCy (rule-based + ML) and with LLMs — both approaches compared
# ── Approach 1: spaCy — fast, traditional NER ──────────────────────────────
import spacy
nlp = spacy.load("en_core_web_trf") # transformer-based model
text = "LumiChats was founded by Aditya Kumar Jha in India. It charges ₹69/day."
doc = nlp(text)
for ent in doc.ents:
print(f"{ent.text:30} → {ent.label_:10} ({spacy.explain(ent.label_)})")
# → LumiChats ORG (Companies, agencies, institutions)
# → Aditya Kumar Jha PERSON (People, including fictional)
# → India GPE (Countries, cities, states)
# ── Approach 2: LLM-based NER — better for novel entities and schemas ──────
from anthropic import Anthropic
import json
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=500,
messages=[{"role": "user", "content": f"""
Extract all named entities from this text. Return JSON array.
Each entity: {{"text": "...", "type": "PERSON|ORG|LOCATION|DATE|MONEY|PRODUCT", "context": "..."}}
Text: "LumiChats was founded by Aditya Kumar Jha in India. It charges ₹69/day for access to Claude Sonnet 4.6, GPT-5.4, and Gemini 3 Pro."
Return only the JSON array, no other text.
"""}]
)
entities = json.loads(response.content[0].text)
for e in entities:
print(f"{e['type']:10} | {e['text']}")NER in 2026: LLMs vs traditional models
- Traditional NER (spaCy, HuggingFace NER models): Very fast (millions of documents per minute), low cost, works offline. Best for high-volume production pipelines where entity categories are well-defined and consistent.
- LLM-based NER: Dramatically better for novel entity types, domain-specific schemas, and documents with complex context. Correctly handles entities that traditional NER misses because they look unusual or span multiple words in unexpected ways. 10–100× slower and more expensive than traditional approaches.
- When to use LLMs for NER: Low-volume, high-value documents (contracts, clinical notes, research papers); custom entity schemas not covered by pretrained NER models; multilingual text with code-switching.
- When to use traditional NER: High-volume pipelines (news wire processing, log analysis); standard entity categories; latency-sensitive applications; cost-sensitive deployments.
Practice questions
- What is the BIO (or IOB) tagging scheme used in NER and why is it necessary? (Answer: BIO tags: B- = Beginning of entity, I- = Inside entity, O = Outside any entity. Necessary because NER operates at token level and entities span multiple tokens. Without BIO: 'New York Times' as three tokens would be ambiguous (three separate entities or one?). With BIO: New/B-ORG York/I-ORG Times/I-ORG unambiguously marks a single organisation spanning all three tokens.)
- Why does NER performance degrade significantly on social media text compared to news articles? (Answer: Social media has: informal language and abbreviations ('lol', 'gonna'), creative capitalization ('iPhone' vs 'IPHONE'), misspellings, URLs and hashtags as entities, evolving slang, and lack of sentence structure. NER models trained on news corpora are trained on clean, well-structured text — the distribution mismatch causes significant accuracy drops. Domain adaptation or fine-tuning on social media NER data is usually required.)
- What is nested NER and what challenge does it present? (Answer: Nested entities: one entity contains another. Example: 'University of New York' — 'New York' is a location inside 'University of New York' which is an organisation. Standard BIO tagging cannot represent overlapping spans. Solutions: layered BIO (one tagging layer per nesting depth), span-based NER models (classify all possible spans rather than tagging tokens sequentially).)
- Compare rule-based NER (regex + gazetteers) vs transformer-based NER. When would you choose each? (Answer: Rule-based: high precision for well-defined entities (product codes, dates, postal codes), transparent and debuggable, fast, no training data needed. Use when entities follow patterns. Transformer-based (BERT fine-tuned): handles context, ambiguity, and novel entities; higher recall for complex entities. Use when entities are contextual ('Apple' as company vs fruit) or when precision < 95% with rules.)
- A BERT-based NER model tags 'Paris' as B-LOC in 'Paris announced new climate policies'. Is this correct? (Answer: Incorrect — 'Paris' here refers to the French government or the Paris Agreement negotiations (an ORG/GPE), not a location being described. The correct label is B-GPE (Geopolitical Entity) or B-ORG depending on annotation scheme. BERT models with full context should correctly disambiguate this — 'announced new policies' is government-typical language. This shows why contextual models outperform token-only classifiers for NER.)