Responsible AI operationalises ethical principles into measurable requirements and engineering practices. Fairness metrics mathematically define what 'fair' means for a specific context — demographic parity, equalised odds, individual fairness. Each captures different notions of fairness and they are mathematically incompatible (fairness impossibility theorem). Data bias analysis identifies where discrimination enters the pipeline. Mitigation techniques apply pre-processing, in-processing, or post-processing corrections. Responsible AI frameworks (Google, Microsoft, IBM, Anthropic) translate these into engineering guidelines.
Fairness metrics — what does fair mean mathematically?
Computing AI fairness metrics with Fairlearn
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
try:
from fairlearn.metrics import (demographic_parity_difference,
equalized_odds_difference, MetricFrame)
from fairlearn.postprocessing import ThresholdOptimizer
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
FAIRLEARN = True
except ImportError:
FAIRLEARN = False
# ── Simulate biased lending dataset ──
np.random.seed(42)
n = 2000
# Sensitive attribute: group A = majority, group B = minority
sensitive = np.random.choice(['A', 'B'], n, p=[0.7, 0.3])
# Income correlated with group (reflects historical inequality)
income = np.where(sensitive == 'A',
np.random.normal(60, 15, n),
np.random.normal(48, 18, n))
credit = np.where(sensitive == 'A',
np.random.normal(680, 60, n),
np.random.normal(640, 70, n))
debt = np.random.normal(0.35, 0.1, n)
X = np.column_stack([income, credit, debt])
# True creditworthiness (unbiased ground truth)
y_true = (income * 0.3 + credit * 0.02 - debt * 100 > 60).astype(int)
X_train, X_test, y_train, y_test, s_train, s_test = train_test_split(
X, y_true, sensitive, test_size=0.3, random_state=42)
# Train model (will learn historical disparities)
clf = LogisticRegression(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
# ── Manual fairness metrics ──
mask_A = s_test == 'A'
mask_B = s_test == 'B'
def approval_rate(predictions, mask):
return predictions[mask].mean()
def tpr(predictions, labels, mask): # True Positive Rate (recall)
tp = ((predictions == 1) & (labels == 1) & mask).sum()
fn = ((predictions == 0) & (labels == 1) & mask).sum()
return tp / (tp + fn) if (tp + fn) > 0 else 0
def fpr(predictions, labels, mask): # False Positive Rate
fp = ((predictions == 1) & (labels == 0) & mask).sum()
tn = ((predictions == 0) & (labels == 0) & mask).sum()
return fp / (fp + tn) if (fp + tn) > 0 else 0
print("Fairness Metrics:")
print(f"{'Metric':<35} {'Group A':>10} {'Group B':>10} {'Difference':>12}")
print("-" * 70)
apr_A = approval_rate(y_pred, mask_A)
apr_B = approval_rate(y_pred, mask_B)
print(f"{'Approval rate (Demographic Parity)':<35} {apr_A:>10.3f} {apr_B:>10.3f} {apr_A-apr_B:>+12.3f}")
tpr_A = tpr(y_pred, y_test, mask_A)
tpr_B = tpr(y_pred, y_test, mask_B)
print(f"{'True Positive Rate (Equalised TPR)':<35} {tpr_A:>10.3f} {tpr_B:>10.3f} {tpr_A-tpr_B:>+12.3f}")
fpr_A = fpr(y_pred, y_test, mask_A)
fpr_B = fpr(y_pred, y_test, mask_B)
print(f"{'False Positive Rate':<35} {fpr_A:>10.3f} {fpr_B:>10.3f} {fpr_A-fpr_B:>+12.3f}")
# ── Fairlearn for automated fairness analysis ──
if FAIRLEARN:
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=s_test)
eod = equalized_odds_difference(y_test, y_pred, sensitive_features=s_test)
print(f"
Fairlearn: DPD = {dpd:.3f}, EOD = {eod:.3f}")
# DPD = 0: perfect demographic parity. |DPD| > 0.1 = concerning
# ── Post-processing mitigation: threshold adjustment ──
# Adjust classification threshold per group to equalise fairness metric
postprocess = ThresholdOptimizer(
estimator=clf,
constraints="equalized_odds", # Equalise TPR and FPR across groups
predict_method="predict_proba",
objective="balanced_accuracy_score"
)
postprocess.fit(X_train, y_train, sensitive_features=s_train)
y_pred_fair = postprocess.predict(X_test, sensitive_features=s_test)
dpd_after = demographic_parity_difference(y_test, y_pred_fair, sensitive_features=s_test)
print(f"After post-processing: DPD = {dpd_after:.3f} (was {dpd:.3f})")Types of bias and where they enter the pipeline
| Bias type | Where it enters | Example | Mitigation |
|---|---|---|---|
| Historical bias | Training data reflects past discrimination | Hiring data: 80% male candidates historically hired | Reweighting, causal analysis, new data collection |
| Representation bias | Some groups under-represented in training data | Facial recognition trained on 90% lighter skin tones | Data augmentation, diverse data collection |
| Measurement bias | Proxies used instead of true target variable | Using arrest rate (proxy) instead of crime rate (true target) | Careful feature selection, domain expert review |
| Aggregation bias | One model for all groups when they differ | Single medical model for diverse demographic groups | Group-specific models or features |
| Evaluation bias | Benchmark does not represent all groups | Image benchmark with no dark-skinned faces | Disaggregated evaluation metrics |
| Deployment bias | System used differently than intended | Hiring AI used for promotion decisions it was not designed for | Use-case scoping, deployment monitoring |
The fairness impossibility theorem
Chouldechova (2017): it is mathematically impossible to simultaneously satisfy demographic parity AND calibration AND equalised odds when base rates differ between groups. You cannot be fair in all senses at once when underlying rates differ. This means every fairness-aware AI system makes a value choice about which type of fairness to prioritise — a technical decision with ethical and legal consequences. This choice should be made explicitly and transparently, not by accident.
Practice questions
- A loan model has approval rates of 72% for Group A and 48% for Group B. Is this unfair? (Answer: Demographic parity difference = 72% - 48% = 24 percentage points. This is likely unfair, but context matters. If the groups have genuinely different creditworthiness distributions (different income, employment stability), some difference may be "fair" by equalised odds. The key question: does the model add discrimination BEYOND what the legitimate features already encode?)
- What is the difference between demographic parity and equalised odds? (Answer: Demographic parity: equal approval rates across groups regardless of actual qualification. Equalised odds: equal true positive rates AND false positive rates across groups — qualified individuals in both groups are equally likely to be approved, and unqualified individuals in both groups are equally likely to be rejected. Equalised odds is generally considered more fair as it conditions on actual qualifications.)
- What is historical bias in AI and why is it self-perpetuating? (Answer: Historical bias: training data reflects past discriminatory decisions. A model trained on historical hiring data learns that women and minorities were hired less — it perpetuates this. Self-perpetuating: biased AI makes biased decisions → those decisions generate new training data → next model is trained on biased outcomes → more biased decisions. Without intervention, AI amplifies historical injustice rather than correcting it.)
- Post-processing mitigation adjusts thresholds per demographic group. What is the risk? (Answer: Post-processing typically uses group membership as input to make different decisions for different groups. While this can equalise outcomes, it may violate anti-discrimination laws that prohibit using protected characteristics as inputs to hiring/lending decisions — even to correct bias. There is a legal and ethical tension between using group membership to mitigate bias vs using group membership as a decision factor.)
- A medical AI achieves 95% accuracy overall but 82% for Black patients. The overall metric hides this disparity. What evaluation practice should be standard? (Answer: Disaggregated evaluation: report metrics separately for each demographic subgroup (race, gender, age, socioeconomic status, geography). Aggregate accuracy can mask severe disparities. Best practice: define minimum acceptable performance for all subgroups as a deployment requirement, not just average performance. Model cards should include disaggregated metrics as standard.)
On LumiChats
Anthropic publishes model cards for Claude documenting known limitations and demographic performance disparities. Understanding fairness metrics helps you critically evaluate these disclosures and hold AI companies accountable for the disparate impacts of their systems.
Try it free