Definition

Text summarisation automatically produces a shorter version of a document that retains the most important information. Two paradigms: <strong>Extractive</strong> summarisation selects and stitches together key sentences directly from the source. <strong>Abstractive</strong> summarisation generates new sentences that may not appear verbatim in the source — more like human summarisation. Modern approaches use transformers (BART, T5, Pegasus) for abstractive summarisation and achieve near-human quality on news and scientific articles.

Real-life analogy: Two types of students

Imagine two students summarising a textbook chapter. The first student highlights key sentences and copies them verbatim into their notes — this is extractive summarisation. The second student reads the whole chapter, understands it, and writes the key ideas in their own words — this is abstractive summarisation. The second approach is more human-like but requires genuine understanding, not just pattern-matching.

Extractive summarisation

Extractive methods score each sentence by importance and select the top-k sentences. Classic algorithms: TF-IDF based scoring (sentences with high TF-IDF words are important), TextRank (graph-based, similar to PageRank — sentences are nodes, edges weighted by similarity), LSA (Latent Semantic Analysis using SVD).

Extractive summarisation with TextRank (sumy library)

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.text_rank import TextRankSummarizer

text = """
Natural Language Processing (NLP) is a branch of artificial intelligence
that deals with the interaction between computers and humans through language.
The ultimate objective of NLP is to read, decipher, understand, and make sense
of the human language in a manner that is valuable. NLP combines computational
linguistics with statistical, machine learning, and deep learning models.
These technologies enable computers to process human language in the form of
text or voice data. Applications include machine translation, sentiment analysis,
named entity recognition, speech recognition, and question answering systems.
"""

parser     = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = TextRankSummarizer()
summary    = summarizer(parser.document, sentences_count=2)

for sentence in summary:
    print(str(sentence))

# Alternative: BERT-based extractive (BertSum)
# pip install bert-extractive-summarizer
from summarizer import Summarizer
model   = Summarizer()
summary = model(text, min_length=50, max_length=150)
print(summary)

Abstractive summarisation with transformers

BART (Facebook, 2019): Denoising autoencoder pre-trained to reconstruct corrupted text, fine-tuned on CNN/DailyMail for summarisation. T5 (Google, 2019): Text-To-Text Transfer Transformer — frames all NLP tasks as text-to-text. Pegasus (Google, 2020): Pre-trained specifically for summarisation by masking entire sentences (Gap Sentences Generation).

Abstractive summarisation with BART

from transformers import pipeline

summarizer = pipeline("summarization",
    model="facebook/bart-large-cnn",
    device=-1)   # CPU; use device=0 for GPU

article = """
The artificial intelligence startup Anthropic, founded by former OpenAI
employees, announced a major funding round that values the company at
over 15 billion dollars. The company is known for its Claude AI assistant,
which competes directly with OpenAI's ChatGPT and Google's Gemini.
Anthropic focuses heavily on AI safety research, publishing papers on
constitutional AI and interpretability. The new funding will be used to
train more powerful models and expand research into safe and beneficial AI.
Investors include Google, Amazon, and several venture capital firms.
"""

summary = summarizer(article,
    max_length=80,
    min_length=30,
    do_sample=False,    # greedy / beam decoding
    num_beams=4,
)[0]["summary_text"]

print(summary)
# "Anthropic, founded by former OpenAI employees, has raised funding valuing
#  the company at over 15 billion dollars. The AI safety startup is known for
#  its Claude assistant, which competes with ChatGPT and Gemini."

ROUGE score — evaluating summaries

ROUGE-N: recall of N-gram overlaps between hypothesis summary and reference. ROUGE-1 (unigrams), ROUGE-2 (bigrams), ROUGE-L (longest common subsequence). Higher = better overlap with human reference summaries.

Method	Copies source text?	Quality	Speed	Best for
TF-IDF scoring	Yes (sentences)	Baseline	Very fast	Quick extraction, news headlines
TextRank	Yes (sentences)	Good	Fast	Unsupervised, no training data
BERT extractive	Yes (sentences)	Better	Medium	When training data scarce
BART/T5 abstractive	No (generates new)	SOTA	Slow (GPU needed)	Production, research, journalism

Practice questions

What is the key difference between extractive and abstractive summarisation? (Answer: Extractive selects existing sentences verbatim. Abstractive generates new sentences that paraphrase the content — closer to human summarisation.)
Why might extractive summarisation produce incoherent summaries? (Answer: Selected sentences are stitched together without considering discourse coherence — they may use pronouns whose referents are in non-selected sentences, causing confusion.)
ROUGE-2 measures overlap of: (Answer: Bigrams (2-word sequences) between the generated summary and the reference summary. Higher ROUGE-2 indicates the model captures more two-word phrases from the reference.)
Pegasus was pre-trained with Gap Sentences Generation. What is this? (Answer: Entire sentences are masked during pre-training, and the model must generate them from context — this directly trains the model for summarisation since generating masked sentences is similar to summarising the remaining context.)
When would you prefer extractive over abstractive summarisation? (Answer: When faithfulness is critical (legal, medical) — extractive cannot hallucinate since it only uses source sentences. Abstractive models may generate plausible-sounding but incorrect facts.)

On LumiChats

LumiChats can summarise PDFs, web pages, and long documents using abstractive summarisation models. The Study Mode feature specifically uses summarisation to create concise notes from textbook chapters — just paste the text and ask for a summary at any detail level.

Try it free

Text Summarisation — Extractive vs Abstractive

Real-life analogy: Two types of students

Extractive summarisation

Abstractive summarisation with transformers

ROUGE score — evaluating summaries

Practice questions

Try LumiChats for ₹69

Related Terms