Definition

A biological neuron receives input signals, processes them, and fires an output signal. An artificial neuron (perceptron) mimics this: it takes weighted inputs, sums them with a bias, applies an activation function, and produces an output. The perceptron is the 1958 invention by Frank Rosenblatt that started the neural network era. A single perceptron classifies linearly separable problems. Stack thousands of neurons in layers and you get a neural network that can solve almost any problem.

Real-life analogy: The voting committee member

A neuron is like a committee member who votes YES or NO. Each piece of evidence (input feature) has a different importance to them (weight). They add up all the weighted evidence, subtract their personal scepticism threshold (bias), and if the total exceeds their threshold — they vote YES (activate). Multiple committee members connected together in layers form the neural network.

Anatomy of a single neuron

z = pre-activation (weighted sum + bias). w = weights (learnable). x = inputs. b = bias (learnable). σ = activation function. The weights control how much each input influences the output. The bias shifts the activation threshold.

Single neuron implemented from scratch

import numpy as np

class Neuron:
    """A single artificial neuron with configurable activation."""

    def __init__(self, n_inputs: int, activation: str = 'relu'):
        # Weights: Xavier initialisation (prevents vanishing/exploding gradients)
        self.weights = np.random.randn(n_inputs) * np.sqrt(2.0 / n_inputs)
        self.bias    = 0.0
        self.activation = activation

    def _activate(self, z: float) -> float:
        if self.activation == 'relu':     return max(0, z)
        if self.activation == 'sigmoid':  return 1 / (1 + np.exp(-z))
        if self.activation == 'tanh':     return np.tanh(z)
        if self.activation == 'linear':   return z
        raise ValueError(f"Unknown: {self.activation}")

    def forward(self, x: np.ndarray) -> float:
        z = np.dot(self.weights, x) + self.bias   # Weighted sum + bias
        return self._activate(z)

    def __repr__(self):
        return f"Neuron(inputs={len(self.weights)}, activation={self.activation})"

# Single neuron: "Is this email spam?" (5 features)
np.random.seed(42)
neuron = Neuron(n_inputs=5, activation='sigmoid')
email_features = np.array([0.8, 0.1, 0.9, 0.0, 0.5])  # word freq, link count, etc.
prob_spam = neuron.forward(email_features)
print(f"Neuron output: {prob_spam:.4f}  {'spam' if prob_spam > 0.5 else 'ham'}")

# Manually set weights and bias (simulating what training learns)
neuron.weights = np.array([0.9, -0.2, 0.8, 0.3, 0.1])   # High weight for spam words
neuron.bias    = -0.5                                       # High bar to classify as spam
prob_spam_tuned = neuron.forward(email_features)
print(f"Tuned neuron: {prob_spam_tuned:.4f}")

# A layer of 4 neurons working on the same input
class Layer:
    def __init__(self, n_inputs, n_neurons, activation='relu'):
        # Weight matrix: (n_inputs × n_neurons) — each column is one neuron
        self.W = np.random.randn(n_inputs, n_neurons) * np.sqrt(2.0 / n_inputs)
        self.b = np.zeros(n_neurons)

    def forward(self, x):
        z = x @ self.W + self.b          # Shape: (batch, n_neurons)
        return np.maximum(0, z)          # ReLU applied element-wise

layer = Layer(n_inputs=5, n_neurons=4)
batch = np.random.randn(32, 5)           # 32 examples, 5 features each
output = layer.forward(batch)
print(f"Layer output shape: {output.shape}")   # (32, 4)

Weights, biases, and what they learn

Weights control the strength and direction of each connection. A large positive weight means 'this input strongly activates this neuron.' A negative weight means 'this input suppresses this neuron.' Bias shifts the activation threshold independently of inputs — it is like the intercept in linear regression. A positive bias makes the neuron more likely to activate; negative makes it less likely.

Component	Symbol	Initialised to	What it controls
Weight	w	Small random (Xavier, He)	How much each input influences this neuron
Bias	b	Zero	Baseline activation level — shifts decision boundary
Activation	σ(z)	Fixed (design choice)	Introduces non-linearity — enables learning complex patterns
Pre-activation	z = wᵀx + b	Computed	Raw signal before non-linearity

Why biases are essential

Without bias, every neuron's decision boundary passes through the origin — severely limiting expressiveness. With bias, the neuron can place its decision boundary anywhere in feature space. Example: a ReLU neuron without bias fires for z > 0, i.e., wᵀx > 0. With bias b = -3, it fires for wᵀx > 3 — can model any threshold.

From one neuron to a network — depth vs width

Width = number of neurons per layer. More neurons → more patterns captured at each level of abstraction. Depth = number of layers. More layers → higher-level features built from lower-level ones (edges → shapes → objects in vision). For the same parameter count, deeper networks outperform wider ones on most tasks. This is why 'deep' learning — not just 'wide' learning.

Practice questions

A neuron has weights [0.5, -0.3, 0.8] and bias -0.2. Input is [1, 2, 0]. Compute the pre-activation z. (Answer: z = 0.5×1 + (-0.3)×2 + 0.8×0 + (-0.2) = 0.5 - 0.6 + 0 - 0.2 = -0.3)
What would happen if all weights were initialised to the same value (e.g., all zeros)? (Answer: Symmetry breaking problem — all neurons in a layer compute identical outputs and receive identical gradients. They all update to the same values and remain identical forever. Random initialisation breaks this symmetry so each neuron learns different features.)
Why can a single perceptron not solve the XOR problem? (Answer: XOR is not linearly separable — no single hyperplane (wᵀx + b = 0) can separate the four XOR input-output pairs. A perceptron only creates linear decision boundaries. At least one hidden layer is needed.)
A layer has 100 input features and 64 neurons. How many trainable parameters? (Answer: 100×64 (weights) + 64 (biases) = 6,464 parameters.)
What does "deep" in deep learning refer to? (Answer: Depth = number of layers in the network. Deeper networks build hierarchical representations — early layers detect low-level patterns (edges), middle layers combine them into shapes, deep layers recognise high-level concepts (faces, objects).)

On LumiChats

Every response LumiChats gives you involves billions of these neuron computations — forward passes through transformer layers where each attention head and MLP block performs exactly the weighted-sum + activation computation described here.

Try it free

Neuron & Perceptron — The Atomic Unit of Neural Networks

Real-life analogy: The voting committee member

Anatomy of a single neuron

Weights, biases, and what they learn

From one neuron to a network — depth vs width

Practice questions

Try LumiChats for ₹69

Related Terms