K-Nearest Neighbours (KNN) is a simple, non-parametric, instance-based learning algorithm. It makes no assumptions about the data distribution — instead, it memorises the entire training set. For a new point, it finds the K closest training examples (neighbours) using a distance metric and assigns the majority class (classification) or the average value (regression). KNN is often called a lazy learner because it defers all computation to prediction time. It is a consistent GATE DS&AI topic, especially questions on the effect of K and distance metrics.
Real-life analogy: The neighbourhood vote
Imagine moving to a new city and trying to predict whether your neighbourhood is safe. You ask your 5 nearest neighbours about their experiences. If 4 out of 5 say it is safe, you classify the area as safe. KNN works exactly like this: for any new data point, look at the K most similar points in the training set and take a majority vote. No model is built — just memory and distance measurement.
Algorithm and distance metrics
- Store all training data (n points, p features).
- For a new query point x: compute the distance from x to every training point.
- Select K nearest neighbours — the K training points with smallest distance.
- Predict: classification = majority class vote; regression = mean of K neighbour values.
Common distance metrics for KNN. Euclidean (L2, r=2) is most common. Manhattan (L1, r=1) is more robust to outliers. Minkowski is the generalisation. Note: features must be scaled (StandardScaler) — otherwise high-magnitude features dominate the distance.
KNN from scratch and sklearn on Iris dataset
import numpy as np
from collections import Counter
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# ── KNN from scratch ──
class KNNClassifier:
def __init__(self, k=3):
self.k = k
def fit(self, X, y):
self.X_train = X
self.y_train = y
def predict(self, X):
return np.array([self._predict_one(x) for x in X])
def _predict_one(self, x):
dists = np.sqrt(np.sum((self.X_train - x)**2, axis=1))
k_idx = np.argsort(dists)[:self.k] # K smallest distances
k_labels = self.y_train[k_idx]
return Counter(k_labels).most_common(1)[0][0] # Majority vote
# Load and split data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
# Feature scaling is CRITICAL for KNN
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Compare scratch vs sklearn
scratch_knn = KNNClassifier(k=5)
scratch_knn.fit(X_train, y_train)
print(f"Scratch KNN (k=5): {accuracy_score(y_test, scratch_knn.predict(X_test)):.3f}")
sklearn_knn = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
sklearn_knn.fit(X_train, y_train)
print(f"sklearn KNN (k=5): {accuracy_score(y_test, sklearn_knn.predict(X_test)):.3f}")
# Finding optimal K using cross-validation
from sklearn.model_selection import cross_val_score
k_vals = range(1, 21)
scores = [cross_val_score(KNeighborsClassifier(k=k), X_train, y_train, cv=5).mean()
for k in k_vals]
best_k = k_vals[np.argmax(scores)]
print(f"Best K: {best_k} with CV accuracy: {max(scores):.3f}")The effect of K — bias-variance trade-off
| K value | Bias | Variance | Behaviour |
|---|---|---|---|
| K=1 | Very low | Very high | Overfits — memorises every training point including noise |
| K=3 to 7 | Low | Low-Medium | Good balance — recommended starting range |
| K=√n | Medium | Medium | Rule-of-thumb: n training points → K ≈ √n |
| K=n | Very high | Very low | Underfits — always predicts the majority class |
KNN weaknesses you must know
Time complexity O(n×p) per query — slow for large datasets. Space O(n×p) — stores the entire training set. Suffers severely from the curse of dimensionality: in high dimensions, all points become equidistant, making nearest-neighbour meaningless. Feature scaling is mandatory — without it, high-magnitude features dominate and low-magnitude features are ignored. Use KD-tree or Ball tree for faster queries.
Practice questions (GATE-style)
- KNN with K=1 on training data gives what error? (Answer: 0% training error — it always predicts its own label since it is its own nearest neighbour.)
- What is the decision boundary of KNN with K=1? (Answer: A Voronoi diagram — each training point creates a cell; all points within the cell are classified with that point's label.)
- If feature 1 has values 0-1000 and feature 2 has values 0-1, what happens without scaling? (Answer: Feature 1 completely dominates the Euclidean distance calculation. Feature 2 has virtually no influence. Always standardise features for KNN.)
- KNN is called a lazy learner because: (Answer: It defers all computation to prediction time — no explicit training phase. All n training examples must be stored and compared at inference. Contrast with eager learners (SVM, logistic regression) that build a model during training.)
- Time complexity of KNN prediction for one query on n training points with p features: (Answer: O(n×p) — must compute distance to all n training points, each requiring p operations.)
On LumiChats
KNN is intuitive and requires no training, making it useful for quick baselines. In LumiChats, similarity search (finding relevant document chunks for RAG) uses an analogous idea — finding the K nearest embedding vectors in a vector database using approximate nearest-neighbour algorithms like HNSW.
Try it free