A biological neuron receives input signals, processes them, and fires an output signal. An artificial neuron (perceptron) mimics this: it takes weighted inputs, sums them with a bias, applies an activation function, and produces an output. The perceptron is the 1958 invention by Frank Rosenblatt that started the neural network era. A single perceptron classifies linearly separable problems. Stack thousands of neurons in layers and you get a neural network that can solve almost any problem.
Real-life analogy: The voting committee member
A neuron is like a committee member who votes YES or NO. Each piece of evidence (input feature) has a different importance to them (weight). They add up all the weighted evidence, subtract their personal scepticism threshold (bias), and if the total exceeds their threshold — they vote YES (activate). Multiple committee members connected together in layers form the neural network.
Anatomy of a single neuron
z = pre-activation (weighted sum + bias). w = weights (learnable). x = inputs. b = bias (learnable). σ = activation function. The weights control how much each input influences the output. The bias shifts the activation threshold.
Single neuron implemented from scratch
import numpy as np
class Neuron:
"""A single artificial neuron with configurable activation."""
def __init__(self, n_inputs: int, activation: str = 'relu'):
# Weights: Xavier initialisation (prevents vanishing/exploding gradients)
self.weights = np.random.randn(n_inputs) * np.sqrt(2.0 / n_inputs)
self.bias = 0.0
self.activation = activation
def _activate(self, z: float) -> float:
if self.activation == 'relu': return max(0, z)
if self.activation == 'sigmoid': return 1 / (1 + np.exp(-z))
if self.activation == 'tanh': return np.tanh(z)
if self.activation == 'linear': return z
raise ValueError(f"Unknown: {self.activation}")
def forward(self, x: np.ndarray) -> float:
z = np.dot(self.weights, x) + self.bias # Weighted sum + bias
return self._activate(z)
def __repr__(self):
return f"Neuron(inputs={len(self.weights)}, activation={self.activation})"
# Single neuron: "Is this email spam?" (5 features)
np.random.seed(42)
neuron = Neuron(n_inputs=5, activation='sigmoid')
email_features = np.array([0.8, 0.1, 0.9, 0.0, 0.5]) # word freq, link count, etc.
prob_spam = neuron.forward(email_features)
print(f"Neuron output: {prob_spam:.4f} {'spam' if prob_spam > 0.5 else 'ham'}")
# Manually set weights and bias (simulating what training learns)
neuron.weights = np.array([0.9, -0.2, 0.8, 0.3, 0.1]) # High weight for spam words
neuron.bias = -0.5 # High bar to classify as spam
prob_spam_tuned = neuron.forward(email_features)
print(f"Tuned neuron: {prob_spam_tuned:.4f}")
# A layer of 4 neurons working on the same input
class Layer:
def __init__(self, n_inputs, n_neurons, activation='relu'):
# Weight matrix: (n_inputs × n_neurons) — each column is one neuron
self.W = np.random.randn(n_inputs, n_neurons) * np.sqrt(2.0 / n_inputs)
self.b = np.zeros(n_neurons)
def forward(self, x):
z = x @ self.W + self.b # Shape: (batch, n_neurons)
return np.maximum(0, z) # ReLU applied element-wise
layer = Layer(n_inputs=5, n_neurons=4)
batch = np.random.randn(32, 5) # 32 examples, 5 features each
output = layer.forward(batch)
print(f"Layer output shape: {output.shape}") # (32, 4)Weights, biases, and what they learn
Weights control the strength and direction of each connection. A large positive weight means 'this input strongly activates this neuron.' A negative weight means 'this input suppresses this neuron.' Bias shifts the activation threshold independently of inputs — it is like the intercept in linear regression. A positive bias makes the neuron more likely to activate; negative makes it less likely.
| Component | Symbol | Initialised to | What it controls |
|---|---|---|---|
| Weight | w | Small random (Xavier, He) | How much each input influences this neuron |
| Bias | b | Zero | Baseline activation level — shifts decision boundary |
| Activation | σ(z) | Fixed (design choice) | Introduces non-linearity — enables learning complex patterns |
| Pre-activation | z = wᵀx + b | Computed | Raw signal before non-linearity |
Why biases are essential
Without bias, every neuron's decision boundary passes through the origin — severely limiting expressiveness. With bias, the neuron can place its decision boundary anywhere in feature space. Example: a ReLU neuron without bias fires for z > 0, i.e., wᵀx > 0. With bias b = -3, it fires for wᵀx > 3 — can model any threshold.
From one neuron to a network — depth vs width
Width = number of neurons per layer. More neurons → more patterns captured at each level of abstraction. Depth = number of layers. More layers → higher-level features built from lower-level ones (edges → shapes → objects in vision). For the same parameter count, deeper networks outperform wider ones on most tasks. This is why 'deep' learning — not just 'wide' learning.
Practice questions
- A neuron has weights [0.5, -0.3, 0.8] and bias -0.2. Input is [1, 2, 0]. Compute the pre-activation z. (Answer: z = 0.5×1 + (-0.3)×2 + 0.8×0 + (-0.2) = 0.5 - 0.6 + 0 - 0.2 = -0.3)
- What would happen if all weights were initialised to the same value (e.g., all zeros)? (Answer: Symmetry breaking problem — all neurons in a layer compute identical outputs and receive identical gradients. They all update to the same values and remain identical forever. Random initialisation breaks this symmetry so each neuron learns different features.)
- Why can a single perceptron not solve the XOR problem? (Answer: XOR is not linearly separable — no single hyperplane (wᵀx + b = 0) can separate the four XOR input-output pairs. A perceptron only creates linear decision boundaries. At least one hidden layer is needed.)
- A layer has 100 input features and 64 neurons. How many trainable parameters? (Answer: 100×64 (weights) + 64 (biases) = 6,464 parameters.)
- What does "deep" in deep learning refer to? (Answer: Depth = number of layers in the network. Deeper networks build hierarchical representations — early layers detect low-level patterns (edges), middle layers combine them into shapes, deep layers recognise high-level concepts (faces, objects).)
On LumiChats
Every response LumiChats gives you involves billions of these neuron computations — forward passes through transformer layers where each attention head and MLP block performs exactly the weighted-sum + activation computation described here.
Try it free