Definition

Responsible AI operationalises ethical principles into measurable requirements and engineering practices. Fairness metrics mathematically define what 'fair' means for a specific context — demographic parity, equalised odds, individual fairness. Each captures different notions of fairness and they are mathematically incompatible (fairness impossibility theorem). Data bias analysis identifies where discrimination enters the pipeline. Mitigation techniques apply pre-processing, in-processing, or post-processing corrections. Responsible AI frameworks (Google, Microsoft, IBM, Anthropic) translate these into engineering guidelines.

Fairness metrics — what does fair mean mathematically?

Computing AI fairness metrics with Fairlearn

import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

try:
    from fairlearn.metrics import (demographic_parity_difference,
        equalized_odds_difference, MetricFrame)
    from fairlearn.postprocessing import ThresholdOptimizer
    from fairlearn.reductions import ExponentiatedGradient, DemographicParity
    FAIRLEARN = True
except ImportError:
    FAIRLEARN = False

# ── Simulate biased lending dataset ──
np.random.seed(42)
n = 2000
# Sensitive attribute: group A = majority, group B = minority
sensitive = np.random.choice(['A', 'B'], n, p=[0.7, 0.3])

# Income correlated with group (reflects historical inequality)
income  = np.where(sensitive == 'A',
                   np.random.normal(60, 15, n),
                   np.random.normal(48, 18, n))
credit  = np.where(sensitive == 'A',
                   np.random.normal(680, 60, n),
                   np.random.normal(640, 70, n))
debt    = np.random.normal(0.35, 0.1, n)

X = np.column_stack([income, credit, debt])
# True creditworthiness (unbiased ground truth)
y_true = (income * 0.3 + credit * 0.02 - debt * 100 > 60).astype(int)

X_train, X_test, y_train, y_test, s_train, s_test = train_test_split(
    X, y_true, sensitive, test_size=0.3, random_state=42)

# Train model (will learn historical disparities)
clf = LogisticRegression(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

# ── Manual fairness metrics ──
mask_A = s_test == 'A'
mask_B = s_test == 'B'

def approval_rate(predictions, mask):
    return predictions[mask].mean()

def tpr(predictions, labels, mask):  # True Positive Rate (recall)
    tp = ((predictions == 1) & (labels == 1) & mask).sum()
    fn = ((predictions == 0) & (labels == 1) & mask).sum()
    return tp / (tp + fn) if (tp + fn) > 0 else 0

def fpr(predictions, labels, mask):  # False Positive Rate
    fp = ((predictions == 1) & (labels == 0) & mask).sum()
    tn = ((predictions == 0) & (labels == 0) & mask).sum()
    return fp / (fp + tn) if (fp + tn) > 0 else 0

print("Fairness Metrics:")
print(f"{'Metric':<35} {'Group A':>10} {'Group B':>10} {'Difference':>12}")
print("-" * 70)

apr_A = approval_rate(y_pred, mask_A)
apr_B = approval_rate(y_pred, mask_B)
print(f"{'Approval rate (Demographic Parity)':<35} {apr_A:>10.3f} {apr_B:>10.3f} {apr_A-apr_B:>+12.3f}")

tpr_A = tpr(y_pred, y_test, mask_A)
tpr_B = tpr(y_pred, y_test, mask_B)
print(f"{'True Positive Rate (Equalised TPR)':<35} {tpr_A:>10.3f} {tpr_B:>10.3f} {tpr_A-tpr_B:>+12.3f}")

fpr_A = fpr(y_pred, y_test, mask_A)
fpr_B = fpr(y_pred, y_test, mask_B)
print(f"{'False Positive Rate':<35} {fpr_A:>10.3f} {fpr_B:>10.3f} {fpr_A-fpr_B:>+12.3f}")

# ── Fairlearn for automated fairness analysis ──
if FAIRLEARN:
    dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=s_test)
    eod = equalized_odds_difference(y_test, y_pred, sensitive_features=s_test)
    print(f"
Fairlearn: DPD = {dpd:.3f}, EOD = {eod:.3f}")
    # DPD = 0: perfect demographic parity. |DPD| > 0.1 = concerning

    # ── Post-processing mitigation: threshold adjustment ──
    # Adjust classification threshold per group to equalise fairness metric
    postprocess = ThresholdOptimizer(
        estimator=clf,
        constraints="equalized_odds",   # Equalise TPR and FPR across groups
        predict_method="predict_proba",
        objective="balanced_accuracy_score"
    )
    postprocess.fit(X_train, y_train, sensitive_features=s_train)
    y_pred_fair = postprocess.predict(X_test, sensitive_features=s_test)

    dpd_after = demographic_parity_difference(y_test, y_pred_fair, sensitive_features=s_test)
    print(f"After post-processing: DPD = {dpd_after:.3f} (was {dpd:.3f})")

Types of bias and where they enter the pipeline

Bias type	Where it enters	Example	Mitigation
Historical bias	Training data reflects past discrimination	Hiring data: 80% male candidates historically hired	Reweighting, causal analysis, new data collection
Representation bias	Some groups under-represented in training data	Facial recognition trained on 90% lighter skin tones	Data augmentation, diverse data collection
Measurement bias	Proxies used instead of true target variable	Using arrest rate (proxy) instead of crime rate (true target)	Careful feature selection, domain expert review
Aggregation bias	One model for all groups when they differ	Single medical model for diverse demographic groups	Group-specific models or features
Evaluation bias	Benchmark does not represent all groups	Image benchmark with no dark-skinned faces	Disaggregated evaluation metrics
Deployment bias	System used differently than intended	Hiring AI used for promotion decisions it was not designed for	Use-case scoping, deployment monitoring

The fairness impossibility theorem

Chouldechova (2017): it is mathematically impossible to simultaneously satisfy demographic parity AND calibration AND equalised odds when base rates differ between groups. You cannot be fair in all senses at once when underlying rates differ. This means every fairness-aware AI system makes a value choice about which type of fairness to prioritise — a technical decision with ethical and legal consequences. This choice should be made explicitly and transparently, not by accident.

Practice questions

A loan model has approval rates of 72% for Group A and 48% for Group B. Is this unfair? (Answer: Demographic parity difference = 72% - 48% = 24 percentage points. This is likely unfair, but context matters. If the groups have genuinely different creditworthiness distributions (different income, employment stability), some difference may be "fair" by equalised odds. The key question: does the model add discrimination BEYOND what the legitimate features already encode?)
What is the difference between demographic parity and equalised odds? (Answer: Demographic parity: equal approval rates across groups regardless of actual qualification. Equalised odds: equal true positive rates AND false positive rates across groups — qualified individuals in both groups are equally likely to be approved, and unqualified individuals in both groups are equally likely to be rejected. Equalised odds is generally considered more fair as it conditions on actual qualifications.)
What is historical bias in AI and why is it self-perpetuating? (Answer: Historical bias: training data reflects past discriminatory decisions. A model trained on historical hiring data learns that women and minorities were hired less — it perpetuates this. Self-perpetuating: biased AI makes biased decisions → those decisions generate new training data → next model is trained on biased outcomes → more biased decisions. Without intervention, AI amplifies historical injustice rather than correcting it.)
Post-processing mitigation adjusts thresholds per demographic group. What is the risk? (Answer: Post-processing typically uses group membership as input to make different decisions for different groups. While this can equalise outcomes, it may violate anti-discrimination laws that prohibit using protected characteristics as inputs to hiring/lending decisions — even to correct bias. There is a legal and ethical tension between using group membership to mitigate bias vs using group membership as a decision factor.)
A medical AI achieves 95% accuracy overall but 82% for Black patients. The overall metric hides this disparity. What evaluation practice should be standard? (Answer: Disaggregated evaluation: report metrics separately for each demographic subgroup (race, gender, age, socioeconomic status, geography). Aggregate accuracy can mask severe disparities. Best practice: define minimum acceptable performance for all subgroups as a deployment requirement, not just average performance. Model cards should include disaggregated metrics as standard.)

On LumiChats

Anthropic publishes model cards for Claude documenting known limitations and demographic performance disparities. Understanding fairness metrics helps you critically evaluate these disclosures and hold AI companies accountable for the disparate impacts of their systems.

Try it free

Responsible AI — Fairness Metrics, Bias Types & Mitigation

Fairness metrics — what does fair mean mathematically?

Types of bias and where they enter the pipeline

Practice questions

Try LumiChats for ₹69

Related Terms