Model auditing applies rigorous review processes to AI systems — examining training data, architecture, performance claims, fairness, robustness, and deployment practices. Model documentation (model cards, datasheets, system cards) creates the paper trail that makes auditing possible. Third-party auditing is analogous to financial auditing — an independent reviewer validates AI developers' claims. Accountability mechanisms determine who is responsible when AI causes harm and provide recourse for affected individuals. Together, these constitute the infrastructure of trustworthy AI.
What a model audit covers
Model card template and automated audit checks
from dataclasses import dataclass, field
from typing import List, Dict, Optional
import json
@dataclass
class ModelCard:
"""
Standardised model documentation (Mitchell et al. 2019 / HuggingFace standard).
"""
# Identity
model_name: str
version: str
model_type: str # "classifier", "language model", "regression"
organisation: str
contact: str
license: str
# Intended use
primary_use_cases: List[str] = field(default_factory=list)
out_of_scope_uses: List[str] = field(default_factory=list)
# Training data
training_data_description: str = ""
training_data_size: str = ""
data_sources: List[str] = field(default_factory=list)
known_data_limitations: List[str] = field(default_factory=list)
# Performance (MUST include disaggregated metrics)
overall_metrics: Dict[str, float] = field(default_factory=dict)
disaggregated_metrics: Dict[str, Dict[str, float]] = field(default_factory=dict)
# Fairness and bias
fairness_analysis: str = ""
known_biases: List[str] = field(default_factory=list)
# Safety and limitations
known_limitations: List[str] = field(default_factory=list)
safety_measures: List[str] = field(default_factory=list)
# Evaluation details
evaluation_datasets: List[str] = field(default_factory=list)
evaluation_methodology: str = ""
def completeness_check(self) -> Dict[str, bool]:
"""Automated check for required documentation fields."""
checks = {
"has_training_data_description": bool(self.training_data_description),
"has_disaggregated_metrics": bool(self.disaggregated_metrics),
"has_known_limitations": bool(self.known_limitations),
"has_out_of_scope_uses": bool(self.out_of_scope_uses),
"has_fairness_analysis": bool(self.fairness_analysis),
"has_known_biases": bool(self.known_biases),
"has_safety_measures": bool(self.safety_measures),
"has_contact_information": bool(self.contact),
}
return checks
def audit_score(self) -> float:
checks = self.completeness_check()
return sum(checks.values()) / len(checks) * 100
# Example model card for a hiring AI
hiring_card = ModelCard(
model_name = "ResumeScreener-v3",
version = "3.2.1",
model_type = "binary classifier",
organisation = "HireX Technologies",
contact = "aiethics@hirex.com",
license = "Proprietary",
primary_use_cases = ["Initial resume screening for software engineering roles"],
out_of_scope_uses = ["Promotion decisions", "Compensation setting", "Performance review"],
training_data_description = "500k anonymised resumes with hiring outcomes 2018-2022",
data_sources = ["Internal hiring records", "LinkedIn profiles (anonymised)"],
known_data_limitations = [
"Training data reflects historical hiring patterns (80% male, 70% top-10 university)",
"Over-represented: US candidates, English-language resumes",
],
overall_metrics = {"precision": 0.78, "recall": 0.71, "f1": 0.74},
disaggregated_metrics = {
"gender": {
"male": {"f1": 0.77, "approval_rate": 0.68},
"female": {"f1": 0.70, "approval_rate": 0.61}, # Disparity flagged
},
"university_tier": {
"top_10": {"f1": 0.82, "approval_rate": 0.75},
"other": {"f1": 0.64, "approval_rate": 0.52}, # Large gap
}
},
known_biases = ["Gender gap: 7 percentage point disparity in approval rates"],
safety_measures = ["Human review for all borderline cases", "Quarterly bias audit"],
known_limitations = ["Not validated for non-engineering roles", "Underperforms for career changers"],
)
print("Model Card Completeness Audit:")
for check, passed in hiring_card.completeness_check().items():
status = "✓" if passed else "✗"
print(f" {status} {check}")
print(f"
Audit score: {hiring_card.audit_score():.0f}/100")Third-party auditing and incident reporting
Third-party AI auditing: Independent assessment of an AI system's claims about safety, fairness, and capability — analogous to financial auditing. Auditors review: training data, model architecture, performance claims, fairness analysis, and deployment practices. Current challenge: AI auditing lacks standardised methodology and accreditation — unlike financial auditing which has established professional standards. The EU AI Act mandates third-party conformity assessment for high-risk AI systems.
- AI Incident Database (AIID): Public database of reported AI failures and harms — searchable by sector, harm type, and company. Used for learning from past failures and identifying systemic risks.
- Bug bounty programs: Reward researchers for finding safety issues. OpenAI, Anthropic, and Google all run programs where security researchers can report jailbreaks, safety bypasses, and harmful output patterns for cash rewards.
- Red teaming: Systematic adversarial testing by dedicated teams trying to find safety failures before deployment. Required by UK AI Safety Institute for frontier models.
- Post-deployment monitoring: Continuous measurement of model outputs in production for safety violations, performance drift, and disparate impacts. Triggers for human review and model updates.
Practice questions
- A model card shows 82% F1 for majority group and 64% F1 for minority group. What should happen? (Answer: This 18-point gap is a significant fairness concern. Action: (1) Investigate the cause (training data imbalance, proxy discrimination). (2) Set minimum acceptable performance floor for minority group. (3) Consider the deployment use case — for hiring/lending, this disparity is likely illegal. (4) Do not deploy until gap is addressed or use case is reconsidered.)
- Why is AI auditing more challenging than financial auditing? (Answer: No standardised methodology yet. AI systems are complex and domain-specific. Proprietary models may not share weights. Performance claims are context-dependent (accuracy varies by input distribution). No universal accreditation for AI auditors. Evaluation benchmarks change. Financial auditing has 100+ years of standards; AI auditing is nascent.)
- What is the difference between a model card and a datasheet for datasets? (Answer: Model card (Mitchell 2019): documents the MODEL — intended use, performance, limitations, fairness. Datasheet for datasets (Gebru 2018): documents the TRAINING DATA — collection process, demographics, known biases, conditions of use. Both are needed for full AI transparency. Datasheet is upstream (data); model card is downstream (model).)
- An AI incident database entry shows a hiring AI rejected qualified candidates from HBCUs at 3× the rate of other schools. What regulatory body would investigate this in the US? (Answer: EEOC (Equal Employment Opportunity Commission) would investigate for disparate impact under Title VII. If the AI is used in federally contracted hiring, OFCCP (Office of Federal Contract Compliance Programs) may also investigate. The company would need to demonstrate the screening criteria are job-related and consistent with business necessity — or face discrimination claims.)
- A company publishes a model card but omits disaggregated performance metrics for gender and race. Is this acceptable? (Answer: Increasingly, no. Under the EU AI Act, high-risk AI must include fairness analysis. Model card best practice (Mitchell 2019, Google) requires disaggregated metrics. Voluntary commitments from major AI labs now include disaggregated reporting. Omitting these metrics when they are technically available is arguably misleading and makes the model card insufficient for meaningful audit.)
On LumiChats
Anthropic publishes model cards, safety policies, and research papers documenting Claude's capabilities and limitations. The system card for Claude describes training methodology, safety measures, known limitations, and red teaming results. This transparency is the foundation of accountable AI development.
Try it free