Definition

K-Nearest Neighbours (KNN) is a simple, non-parametric, instance-based learning algorithm. It makes no assumptions about the data distribution — instead, it memorises the entire training set. For a new point, it finds the K closest training examples (neighbours) using a distance metric and assigns the majority class (classification) or the average value (regression). KNN is often called a lazy learner because it defers all computation to prediction time. It is a consistent GATE DS&AI topic, especially questions on the effect of K and distance metrics.

Real-life analogy: The neighbourhood vote

Imagine moving to a new city and trying to predict whether your neighbourhood is safe. You ask your 5 nearest neighbours about their experiences. If 4 out of 5 say it is safe, you classify the area as safe. KNN works exactly like this: for any new data point, look at the K most similar points in the training set and take a majority vote. No model is built — just memory and distance measurement.

Algorithm and distance metrics

Store all training data (n points, p features).
For a new query point x: compute the distance from x to every training point.
Select K nearest neighbours — the K training points with smallest distance.
Predict: classification = majority class vote; regression = mean of K neighbour values.

Common distance metrics for KNN. Euclidean (L2, r=2) is most common. Manhattan (L1, r=1) is more robust to outliers. Minkowski is the generalisation. Note: features must be scaled (StandardScaler) — otherwise high-magnitude features dominate the distance.

KNN from scratch and sklearn on Iris dataset

import numpy as np
from collections import Counter
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# ── KNN from scratch ──
class KNNClassifier:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        return np.array([self._predict_one(x) for x in X])

    def _predict_one(self, x):
        dists = np.sqrt(np.sum((self.X_train - x)**2, axis=1))
        k_idx = np.argsort(dists)[:self.k]           # K smallest distances
        k_labels = self.y_train[k_idx]
        return Counter(k_labels).most_common(1)[0][0]  # Majority vote

# Load and split data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# Feature scaling is CRITICAL for KNN
scaler  = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# Compare scratch vs sklearn
scratch_knn = KNNClassifier(k=5)
scratch_knn.fit(X_train, y_train)
print(f"Scratch KNN (k=5): {accuracy_score(y_test, scratch_knn.predict(X_test)):.3f}")

sklearn_knn = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
sklearn_knn.fit(X_train, y_train)
print(f"sklearn KNN (k=5): {accuracy_score(y_test, sklearn_knn.predict(X_test)):.3f}")

# Finding optimal K using cross-validation
from sklearn.model_selection import cross_val_score
k_vals = range(1, 21)
scores = [cross_val_score(KNeighborsClassifier(k=k), X_train, y_train, cv=5).mean()
          for k in k_vals]
best_k = k_vals[np.argmax(scores)]
print(f"Best K: {best_k} with CV accuracy: {max(scores):.3f}")

The effect of K — bias-variance trade-off

K value	Bias	Variance	Behaviour
K=1	Very low	Very high	Overfits — memorises every training point including noise
K=3 to 7	Low	Low-Medium	Good balance — recommended starting range
K=√n	Medium	Medium	Rule-of-thumb: n training points → K ≈ √n
K=n	Very high	Very low	Underfits — always predicts the majority class

KNN weaknesses you must know

Time complexity O(n×p) per query — slow for large datasets. Space O(n×p) — stores the entire training set. Suffers severely from the curse of dimensionality: in high dimensions, all points become equidistant, making nearest-neighbour meaningless. Feature scaling is mandatory — without it, high-magnitude features dominate and low-magnitude features are ignored. Use KD-tree or Ball tree for faster queries.

Practice questions (GATE-style)

KNN with K=1 on training data gives what error? (Answer: 0% training error — it always predicts its own label since it is its own nearest neighbour.)
What is the decision boundary of KNN with K=1? (Answer: A Voronoi diagram — each training point creates a cell; all points within the cell are classified with that point's label.)
If feature 1 has values 0-1000 and feature 2 has values 0-1, what happens without scaling? (Answer: Feature 1 completely dominates the Euclidean distance calculation. Feature 2 has virtually no influence. Always standardise features for KNN.)
KNN is called a lazy learner because: (Answer: It defers all computation to prediction time — no explicit training phase. All n training examples must be stored and compared at inference. Contrast with eager learners (SVM, logistic regression) that build a model during training.)
Time complexity of KNN prediction for one query on n training points with p features: (Answer: O(n×p) — must compute distance to all n training points, each requiring p operations.)

On LumiChats

KNN is intuitive and requires no training, making it useful for quick baselines. In LumiChats, similarity search (finding relevant document chunks for RAG) uses an analogous idea — finding the K nearest embedding vectors in a vector database using approximate nearest-neighbour algorithms like HNSW.

Try it free

K-Nearest Neighbours (KNN)

Real-life analogy: The neighbourhood vote

Algorithm and distance metrics

The effect of K — bias-variance trade-off

Practice questions (GATE-style)

Try LumiChats for ₹69

Related Terms