Definition

AI bias refers to systematic errors in AI model outputs that create unfair outcomes for certain groups of people — often related to race, gender, age, disability, or socioeconomic status. Bias enters through training data (reflecting historical inequalities), model architecture choices, evaluation metrics, and deployment decisions. Fairness in AI means designing and auditing systems to ensure their outputs are equitable across demographic groups.

Where bias comes from: the pipeline

Bias is not a single problem with a single fix — it enters at every stage of the AI development pipeline, often in hard-to-detect ways:

Stage	Source of bias	Real-world example	Detection method
Data collection	Non-representative training data	Facial recognition trained mostly on light-skinned faces; error rate on dark skin was 34% vs 0.8% (Buolamwini & Gebru, 2018)	Demographic breakdown of dataset; representation audits
Label collection	Human annotator bias	Sentiment labelers rated the same text as more negative when written in African American English	Inter-annotator agreement per demographic; bias in annotation guidelines
Feature engineering	Proxy variables encode protected attributes	ZIP code encodes race; using it in a loan model discriminates indirectly	Correlation analysis between features and protected attributes
Model training	Class imbalance; optimization for average accuracy	High overall accuracy masks 40% error rate on the minority class	Disaggregated evaluation metrics per subgroup
Evaluation	Benchmark datasets under-represent minority groups	A model scores 94% on a benchmark where 90% of test examples are from one group	Stratified evaluation; held-out subgroup test sets
Deployment	Distribution shift; feedback loops	A biased hiring model rejects minority candidates → less diverse training data → more bias next cycle	Monitoring production outputs; disparate impact audits

Definitions of fairness — and why they conflict

One of the most important (and counterintuitive) results in algorithmic fairness is that many common definitions of fairness are mathematically incompatible — you can't satisfy all of them simultaneously.

Fairness definition	What it requires	Mathematical condition	Problem with it
Demographic parity	Equal positive prediction rates across groups	P(Ŷ=1\|A=0) = P(Ŷ=1\|A=1)	Ignores actual base rates; can force unequal error rates
Equal opportunity	Equal true positive rates (recall) across groups	P(Ŷ=1\|Y=1,A=0) = P(Ŷ=1\|Y=1,A=1)	Can allow very different false positive rates
Equalized odds	Equal TPR and FPR across groups	Both TPR and FPR equal across A	Mathematically incompatible with calibration when base rates differ
Calibration	Predicted probabilities match actual outcomes equally for all groups	P(Y=1\|score=s,A=0) = P(Y=1\|score=s,A=1)	Incompatible with equalized odds when base rates differ
Individual fairness	Similar individuals should receive similar predictions	If d(x,x') is small, \|f(x)-f(x')\| should be small	Requires defining "similar" without bias

The impossibility theorem

Chouldechova (2017) and Kleinberg et al. (2016) proved that when base rates differ across groups, calibration, false positive rate parity, and false negative rate parity cannot all be achieved simultaneously. Any real system must choose which fairness criteria matter most for the specific application — there is no mathematically perfect solution.

Practical mitigation techniques

Technique	When applied	How it works	Tradeoff
Data resampling / reweighting	Pre-processing	Oversample underrepresented groups; assign higher loss weights to minority samples	Can improve parity but may reduce overall accuracy
Adversarial debiasing	In-training	Train a classifier to predict the target AND an adversary to predict the protected attribute from representations; penalize the adversary	Adds training complexity; can be unstable
Reranking / post-processing	Post-processing	Adjust decision thresholds per group to equalize specified metrics	Requires group labels at inference; legally sensitive in some jurisdictions
Counterfactual data augmentation	Pre-processing	Generate versions of training examples with protected attributes swapped; train on both	Effective for text/NLP; harder for structured data
RLHF with fairness constraints	LLM fine-tuning	Include fairness criteria in human feedback; penalize biased outputs in reward model	Expensive; hard to define "fair" consistently across annotators

Fairness auditing tools

Open-source libraries: Fairlearn (Microsoft), AI Fairness 360 (IBM/AIF360), What-If Tool (Google), and Aequitas (U Chicago). For LLMs specifically: BOLD and WinoBias benchmarks measure representation and stereotype bias in generated text.

Practice questions

What is the difference between disparate treatment and disparate impact in AI systems? (Answer: Disparate treatment: the AI explicitly uses a protected characteristic (race, gender, age) as an input to make decisions — intentional discrimination. Disparate impact: the AI uses neutral-seeming variables (zip code, name, education institution) that correlate with protected characteristics, producing discriminatory outcomes without explicit use of those characteristics. Both can be illegal under anti-discrimination law. Most AI bias cases involve disparate impact since training on historical data automatically captures proxy correlations.)
What is the word embedding test for gender bias (WEAT) and what did studies find? (Answer: Word Embedding Association Test (WEAT): measures whether gendered words (he/she) are more similar to certain career or attribute words in embedding space. Caliskan et al. (2017) found: word2vec and GloVe associate 'programmer, engineer, scientist' more closely with male pronouns; 'nurse, teacher, librarian' more closely with female pronouns — mirroring US labour market statistics. These biases reflect historical data but are problematic when used in hiring/recommendation systems.)
COMPAS is a recidivism prediction tool used in US courts. What bias issue did ProPublica identify? (Answer: ProPublica (2016) found COMPAS predicted Black defendants would reoffend at nearly twice the false positive rate of White defendants — Black defendants who did NOT reoffend were labelled high risk more often. Northpointe (COMPAS developer) argued the tool was 'fair' by calibration metric (equal accuracy across groups). This exemplifies the fairness impossibility theorem: ProPublica's definition (equal FPR) and Northpointe's definition (calibration) are mathematically incompatible when base rates differ.)
What is 'technical debt' in AI fairness and why is it hard to address retroactively? (Answer: Technical debt: deploying a biased model creates a record of biased decisions (denied loans, failed interviews) that becomes the next round of training data if not carefully managed. The biased model's outputs may influence real-world distributions (denying loans to a community reduces economic activity, making future loan applications from that community look riskier). Retroactive debiasing requires: identifying the source of bias, retraining on corrected data, addressing real-world impacts of past decisions — none of which are technically straightforward.)
What is 'intersectional fairness' and why is standard demographic fairness analysis insufficient? (Answer: Intersectional fairness (Crenshaw's intersectionality applied to ML): a model may be fair for Black individuals AND fair for women when evaluated separately, but unfair specifically for Black women. Standard fairness analysis evaluates one dimension at a time. Intersectional analysis evaluates all combinations of demographic groups. Practical challenge: small group sizes at intersections (e.g., 'non-binary Hispanic individuals') make statistical analysis unreliable. But ignoring intersections misses systematic harm to specific communities.)

AI Bias & Fairness

Where bias comes from: the pipeline

Definitions of fairness — and why they conflict

Practical mitigation techniques

Practice questions

Try LumiChats for ₹69

Related Terms