AI Governance & Compliance Mastery
Complete Study Guide + Reference
Including Governance Maturity Model, Failure Modes, and Citations
Brought to you by SpanForge | getspanforge.com | May 2026
Quick Navigation
- Introduction
- Part 1: Regulatory Frameworks
- Part 2: Technical Governance Architecture
- Part 3: Observability as Strategic Infrastructure
- Part 4: Organizational Implementation
- Part 5: Governance Maturity Model
- Part 6: Common Failure Modes
- Part 7: Scenario-Based Workbook
- Appendix A: Self-Assessment Checklist
- Appendix B: Full References & Citations
Introduction
Observability does not equal compliance evidence.
This foundational insight shapes everything that follows. A system can generate terabytes of logs, alerts, and dashboards — and still fail a regulatory audit because none of it constitutes proof that risk was assessed, governed, and responded to.
This guide closes that gap. It covers six regulatory frameworks, the technical architecture needed for governance-native AI, organizational implementation, and the failure modes that destroy compliance programs in production.
By the end, you will have:
- Comprehensive framework knowledge (6 standards)
- A maturity model to assess your current state
- Real failure modes to learn from and avoid
- Self-assessment tools
- Complete academic and regulatory citations
Part 1: Regulatory Frameworks (6 Standards)
1. EU AI Act (Articles 6–29)
Status: Entered force August 2023. Enforcement phased through 2026–2027.
Key Articles:
| Article | Requirement |
|---|---|
| Article 6 | High-risk AI classification (Annex III) |
| Articles 8–15 | Risk management, data governance, record-keeping, human oversight |
| Article 14 | Meaningful human oversight for high-risk decisions |
| Article 29 | Transparency requirements |
Annex III High-Risk Categories:
| Category | Examples |
|---|---|
| Education & Training | Admissions screening, exam scoring |
| Employment & Labor | Hiring screeners, performance AI |
| Credit & Banking | Loan approval, credit scoring |
| Law Enforcement | Crime detection, risk scoring |
| Immigration & Border | Visa, asylum decisions |
| Healthcare & Medicine | Diagnostics, treatment recommendations |
| Critical Infrastructure | Electricity, water, transport control |
| Benefits & Social Services | Eligibility decisions |
| Law Enforcement Risk Assessment | Bail, sentencing recommendations |
Fines:
| Violation | Maximum Fine |
|---|---|
| Prohibited AI or high-risk violations | €30M or 6% of global revenue |
| Documentation and governance failures | €20M or 4% of global revenue |
| Transparency violations | €10M or 2% of global revenue |
Key insight: EU AI Act demands auditable AI — proof you assessed risk before deployment, monitored it in production, and escalated when you found problems.
2. GDPR (Articles 5, 9, 13–14, 32)
Status: In force since May 2018. Active enforcement. AI-specific enforcement increasing significantly.
Data protection law constraining AI operations.
Article 5 — Five Core Principles:
| Principle | What It Means for AI |
|---|---|
| Lawfulness, Fairness, Transparency | Document lawful basis; inform users of AI use |
| Purpose Limitation | Data collected for one purpose cannot be reused without further legal analysis |
| Data Minimisation | Only process what is strictly necessary |
| Accuracy | Keep data accurate; allow corrections to AI outputs |
| Storage Limitation | Delete personal data when no longer needed |
| Integrity & Confidentiality | Encrypt, access-control, and audit all processing |
Article 9 — Special category data restrictions: health, racial/ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, sexual orientation.
Article 32 — Security of processing: encryption, access control, regular testing.
Fines:
| Violation | Maximum Fine |
|---|---|
| Core principle violations (Articles 5, 6, 9) | €20M or 4% of global annual turnover |
| Data subject rights violations (Articles 12–22) | Up to €20M or 4% of global annual turnover |
| Procedural violations | €10M or 2% of global annual turnover |
3. HIPAA (Safe Harbor)
Status: US federal law. Applies to covered entities and business associates handling Protected Health Information (PHI).
The 18 Safe Harbor Identifiers (must be removed or masked):
| # | Identifier | # | Identifier |
|---|---|---|---|
| 1 | Names | 10 | Account numbers |
| 2 | Geographic data (sub-state) | 11 | Certificate/license numbers |
| 3 | Dates (except year) | 12 | Vehicle identifiers |
| 4 | Phone numbers | 13 | Device identifiers |
| 5 | Fax numbers | 14 | Web URLs |
| 6 | Email addresses | 15 | IP addresses |
| 7 | Social Security numbers | 16 | Biometric identifiers |
| 8 | Medical record numbers | 17 | Full-face photographs |
| 9 | Health plan beneficiary numbers | 18 | Any other unique identifying number |
The LLM Re-Identification Risk: Standard de-identification removes the 18 identifiers. But LLMs can reconstruct identity from prose — combining diagnosis, location, age range, and dates in a way that uniquely identifies a patient. Safe Harbor alone does not address this risk.
Breach Notification:
- Notify affected individuals: within 60 days, without unreasonable delay
- Notify HHS: within 60 days (immediate if >500 affected in a state)
- Media notification: if >500 individuals in a jurisdiction
4. SOC 2 (Trust Service Criteria)
Status: AICPA standard. Required by most enterprise B2B customers. AI systems increasingly require SOC 2 attestation.
The 5 Trust Service Categories:
| Category | What It Covers |
|---|---|
| Security (CC) | Access controls, change management, incident response — required for all SOC 2 |
| Availability (A1) | System availability commitments and recovery objectives |
| Processing Integrity (PI) | Processes data completely, accurately, and in a timely manner |
| Confidentiality (C) | Information designated confidential is protected |
| Privacy (P) | Personal information collected and used per commitments |
Key Criteria for AI Systems:
| Criteria | What It Covers for AI |
|---|---|
| CC6 | Who can access your AI systems and training data? |
| CC7 | How do you detect anomalies in AI behavior? |
| CC8 | How do you govern AI model updates and deployments? |
| CC9 | How do you assess and mitigate AI-specific risks? |
What Auditors Look For: Evidence of controls being followed — not just documentation of controls. Audit logs, incident reports, and change management records showing actual enforcement.
5. ISO 42001:2023
Status: Published December 2023. AI Management Systems standard. Early adoption phase — industry interpretation still maturing. Auditor approaches vary.
Structure (Plan-Do-Check-Act):
| Phase | Sections | Key Activities |
|---|---|---|
| Plan | 4–6 | Context, leadership, risk & impact assessment |
| Do | 7–8 | Support, operational controls, AI system lifecycle |
| Check | 9 | Performance evaluation, monitoring, internal audit |
| Act | 10 | Improvement, nonconformity management |
Key Requirements:
| Section | Requirement |
|---|---|
| 6.1 | Risk and AI impact assessment |
| 8.2 | AI system design and development controls |
| 8.3 | AI system operation controls |
| 8.4 | Third-party AI system controls |
| 9.1 | Monitoring, measurement, and evaluation |
Important caveat: ISO 42001 is a management system standard — it doesn't prescribe specific technical controls. It requires a documented, systematic approach to managing AI risks across the organization.
6. NIST AI Risk Management Framework (AI RMF 1.0)
Status: Published January 2023. Voluntary US framework. Widely referenced in contracts, procurement, and policy.
The Four Core Functions:
| Function | What It Means |
|---|---|
| Govern | Establish organizational strategy for AI risk; define accountability; create policies; cultivate culture |
| Map | Categorize AI systems by context and risk; identify stakeholders; document purpose, limitations, assumptions |
| Measure | Quantify AI risks (bias, reliability, security, explainability); establish metrics; test and evaluate |
| Manage | Implement risk responses; prioritize residual risks; plan for incidents; communicate with stakeholders |
Key insight: NIST AI RMF is designed to complement other frameworks — not replace them. It maps well to EU AI Act, ISO 42001, and SOC 2.
Part 2: Technical Governance Architecture
The Three-Layer Governance Stack
Governance-native AI requires infrastructure across three layers:
Layer 3: Risk Management
├── Human-in-the-loop escalation
├── Behavioral drift detection
├── Alert routing and SLA management
└── Framework compliance mapping
Layer 2: Evidence Generation
├── HMAC-SHA256 audit chaining
├── WORM storage
├── Framework-mapped evidence bundles
└── Auditor-ready report generation
Layer 1: Instrumentation
├── Decision capture (every AI output)
├── PII detection and redaction
├── Secret scanning (API keys, tokens)
├── Policy enforcement gates
└── OpenTelemetry-aligned spans
HMAC-SHA256 Cryptographic Chaining
The Problem: Standard audit logs can be edited. A database admin can change a log entry, making logs alone insufficient to prove records haven't been tampered with.
The Solution: Each record includes a cryptographic hash of its own content plus the previous record's hash:
Record N hash = HMAC-SHA256(Record N content + Record N-1 hash)
If any record is modified, its hash changes — which invalidates every subsequent hash. The chain breaks, making tampering immediately detectable.
What this proves to auditors:
- Records have not been modified since creation
- The sequence of events is accurate
- No records have been deleted from the middle of the chain
PII Detection: Three-Layer Approach
| Layer | Method | What It Catches |
|---|---|---|
| Layer 1: Pattern Matching | Regular expressions | Structured PII: SSNs, phones, emails, credit cards |
| Layer 2: Model-Based | Named entity recognition (NER) | Names, organizations, locations in unstructured prose |
| Layer 3: Entropy Analysis | Statistical analysis | API keys, tokens, passwords, high-entropy secrets |
The Gap Standard Approaches Miss: Context-dependent re-identification — where a combination of non-PII attributes (age range + location + diagnosis + dates) uniquely identifies an individual. Requires semantic analysis beyond pattern matching. Particularly critical for HIPAA compliance with LLM outputs.
Policy Enforcement Gates
Policy enforcement must happen before data persists, not after.
| Gate | What It Enforces |
|---|---|
| PII gate | Blocks or redacts personal data before it enters the audit chain |
| Secret gate | Detects and blocks API keys, tokens, credentials in outputs |
| Confidence gate | Routes low-confidence decisions to human review |
| Drift gate | Blocks or flags outputs when model behavior deviates from baseline |
| Compliance gate | Enforces policy rules (prohibited content, required disclosures) |
AI Explainability
SHAP (SHapley Additive exPlanations)
- Assigns each feature an importance value for a specific prediction
- Mathematically grounded in game theory; model-agnostic
- Computationally expensive for large models
- Reference: Lundberg & Lee (2017)
LIME (Local Interpretable Model-agnostic Explanations)
- Creates a local approximation of model behavior around a specific prediction
- Faster than SHAP for large datasets; less mathematically rigorous
- Reference: Ribeiro, Singh & Guestrin (2016)
Model Cards
- Structured documentation of model purpose, performance, and limitations
- Should include: intended use, out-of-scope uses, evaluation data, metrics, ethical considerations
- Increasingly required for high-risk AI deployments
The Six Types of AI Drift
| Drift Type | What Changes | Detection Method |
|---|---|---|
| Embedding drift | Statistical distribution of input representations | Monitor embedding space distances |
| Semantic drift | Meaning of inputs (distribution may appear stable) | Track semantic similarity scores |
| Retrieval/RAG drift | Retrieved documents become less relevant | Monitor retrieval relevance scores |
| Evaluation drift | Model performance on labeled data degrades | Regular benchmark testing |
| Prompt drift | User prompt patterns shift, affecting outputs | Track prompt clustering |
| Grounding degradation | Factual accuracy of outputs decreases over time | Monitor against ground truth |
Part 3: Observability as Strategic Infrastructure
Observability vs. Compliance Evidence
| Observability | Compliance Evidence |
|---|---|
| Answers: "What is happening?" | Answers: "Can I prove what happened?" |
| Mutable logs | Tamper-evident, signed records |
| Vendor-specific formats | Framework-mapped, portable |
| Operational focus | Audit focus |
| Real-time dashboards | Historical evidence bundles |
| Detects problems | Proves you responded to problems |
The gap that gets organizations fined: Having observability without compliance evidence. Regulators don't want to see that you have monitoring. They want proof that monitoring produced evidence you acted on.
OpenTelemetry: The Governance-Neutral Foundation
OpenTelemetry (OTel) is the CNCF standard for telemetry data — traces, metrics, and logs. It provides a vendor-neutral foundation for AI governance infrastructure.
Why OTel matters for governance:
| Property | Value |
|---|---|
| Portability | Telemetry data is not locked to a single vendor |
| Standardization | Consistent data format across all systems |
| Integration | Connects to Datadog, Grafana, Splunk, and any observability platform |
| Auditability | Structured data is easier to sign and verify |
The Governance Layer on Top of OTel: Standard OTel captures what happened. Governance infrastructure also needs to capture proof that you managed it — through cryptographic signing of spans, framework-level tagging (which article/requirement does this satisfy?), and automated evidence bundling.
Vendor-Neutral Architecture Principles
- Instrument at the SDK layer — before data reaches any specific vendor
- Use open protocols — OTel, JSON, standard formats
- Separate storage from analysis — raw signed records stored independently from analysis tools
- Framework-mapped exports — evidence bundles portable and not dependent on a specific platform to read
Part 4: Organizational Implementation
The COSO Framework Applied to AI Governance
| COSO Component | AI Governance Application |
|---|---|
| Control Environment | Leadership commitment to AI governance; clear accountability structures |
| Risk Assessment | Pre-deployment risk assessment; ongoing risk monitoring |
| Control Activities | Policy enforcement gates; human review workflows; change management |
| Information & Communication | Audit trails; incident reporting; governance dashboards |
| Monitoring Activities | Drift detection; compliance metrics; audit cycle |
The T.R.U.S.T. Scorecard
A board-level framework for measuring AI trustworthiness:
| Dimension | What It Measures | Key Metrics |
|---|---|---|
| T — Transparency | Can the organization explain AI decisions? | Explainability coverage, audit log completeness |
| R — Reliability | Does the AI perform consistently and accurately? | Accuracy rates, drift frequency, incident rate |
| U — User Trust | Do users trust and understand the AI? | Override rates, complaint rates, appeal volume |
| S — Security | Is the AI protected against attacks? | Adversarial test results, breach incidents |
| T — Traceability | Can every decision be traced back to its inputs? | Audit trail coverage, evidence bundle quality |
Organizational Roles in AI Governance
| Role | Responsibility | Key Outputs |
|---|---|---|
| AI System Owner | End-to-end accountability for a specific AI system | Risk assessment, governance policy |
| ML Engineer | Technical implementation of governance controls | Instrumented pipelines, drift detection |
| Compliance Officer | Framework interpretation and evidence review | Framework mapping, audit readiness |
| Data Protection Officer | Privacy compliance (GDPR, HIPAA) | DPIAs, RoPA, breach response |
| CISO / Security | Security of AI systems and data | Security audits, incident response |
| Legal | Regulatory interpretation | Lawful basis documentation, contract review |
| Executive Sponsor | Board-level accountability | T.R.U.S.T. scorecard, governance strategy |
The Limits of Governance
Governance CAN:
- Prove that risk was assessed before deployment
- Prove that monitoring was in place
- Prove that incidents were detected and responded to
- Prove that human oversight was applied
- Generate auditor-ready evidence
Governance CANNOT:
- Make a biased model unbiased
- Make an inaccurate model accurate
- Prevent all harms from AI systems
- Substitute for appropriate model development and testing
- Guarantee regulatory compliance (compliance is a legal determination)
Part 5: Governance Maturity Model
Five levels of maturity from reactive to governance-native:
| Level | State | Characteristics | Audit Outcome |
|---|---|---|---|
| 1: Reactive | No governance | Logs exist but are mutable. Risk assessment done retroactively. No policies enforced. | Non-compliant findings |
| 2: Observability | Logs and dashboards exist | Monitoring in place. Records mutable, no policy enforcement, no framework mapping. | Partial compliance |
| 3: Policy Enforcement | Policies enforced | PII redaction, drift alerts, escalation paths in place. Evidence mapping is manual. | Framework alignment gap |
| 4: Evidence Automation | HMAC-signed, auto-mapped bundles | Evidence generation automatic. Audit cycle reduces from weeks to days. | Strong compliance, 2-week audit cycle |
| 5: Governance-Native | Governance embedded in engineering | Compliance is a property of the system, not a separate process. Continuous evidence generation. | Governance advantage |
Moving Between Levels
Level 1 → 2: Deploy logging and monitoring. Table stakes — necessary foundation, but not compliance.
Level 2 → 3: Add policy enforcement. PII detection, secret scanning, confidence thresholds, escalation workflows. Move from observation to control.
Level 3 → 4: Automate evidence generation. Add cryptographic signing, framework mapping, evidence bundling. Move from manual to automatic proof.
Level 4 → 5: Embed governance in engineering culture. Compliance is no longer just a compliance team responsibility — it's built into every deployment, every model update, every incident response.
Part 6: Common Governance Failure Modes
Understanding failure modes is as important as knowing what should go right.
Failure Mode 1: Alerts Exist but Are Ignored
Symptom: Drift alerts fire weekly. Nobody investigates.
Why it matters: Auditor asks "show me how you responded to drift." Ignored alerts don't constitute a compliance defense — they document negligence.
Fix:
- Reduce false positives (tune alert baseline from production data)
- Clear ownership: one person per alert type
- SLA: investigate within 4 hours, close within 24 hours
- Document all investigations — the audit trail of your response matters as much as the alert itself
Failure Mode 2: Policies Exist but Aren't Enforced
Symptom: "All changes require review." Sometimes they do, sometimes they don't.
Why it matters: Inconsistent enforcement documents negligence more clearly than no policy at all. Regulators view "policy exists but wasn't followed" as evidence of systemic failure.
Fix: Automate enforcement. Make it technically impossible to bypass policy review. If a human can skip the gate, eventually a human will skip the gate.
Failure Mode 3: Audit Logs Are Mutable
Symptom: Logs stored in a database with admin access. Records could be edited without detection.
Why it matters: You cannot prove records haven't been tampered with. Regulators increasingly require tamper-evident records — especially for financial, healthcare, and employment AI decisions.
Fix: HMAC chaining + WORM (Write Once Read Many) storage + separation of duties. No single person should be able to modify records without detection.
Failure Mode 4: Human Oversight Only on Paper
Symptom: "High-risk decisions require human review." Humans route them through without reading.
Why it matters: EU AI Act Article 14 requires meaningful human oversight. Rubber-stamping is not oversight. Regulators will ask how long reviewers spend per decision and whether they have the information needed to actually review.
Fix:
- Reduce escalation volume (route only genuinely ambiguous decisions)
- Give reviewers context: why was this flagged? What was the AI's reasoning?
- Track reviewer decisions and give feedback on past outcomes
- Allow enough time to actually review — not 30 decisions per hour
Failure Mode 5: Alert Fatigue Masks Real Drift
Symptom: Real drift goes undetected because the team has learned to ignore alerts — because you also alert on "0.02% latency change."
Why it matters: When everything is urgent, nothing is.
Fix:
- Strict alert criteria: only alert on genuinely significant deviations
- Triage by severity: P1 (immediate), P2 (same day), P3 (weekly review)
- Monthly false positive rate review — if >20% of alerts are false positives, tune thresholds
- Separate operational alerts from compliance alerts
Failure Mode 6: Retention Policy Conflicts With Compliance Requirements
Symptom: Compliance requires a 7-year audit trail. Operations wants logs deleted after 90 days.
Why it matters: The organization is simultaneously violating its own retention policy and regulatory requirements. Neither team knows, because they've never discussed it.
Fix:
- Separate hot and cold storage: operational logs (90-day) vs. compliance records (7-year)
- Calculate the true cost of compliance storage — far less than the cost of a regulatory fine
- Make retention policy a governance decision, not an operations decision
Failure Mode 7: Framework Mapping Is Manual
Symptom: You generate evidence. A compliance analyst manually tags which regulatory articles each piece satisfies. This takes weeks before each audit.
Why it matters: Manual mapping is slow, error-prone, and doesn't scale. As AI systems multiply, manual mapping becomes the bottleneck for every audit.
Fix: Automate mapping. Each policy enforcement action should automatically tag which framework clause it satisfies. Evidence bundles generated from tags, not assembled manually.
Part 7: Scenario-Based Workbook
Scenario 1: Healthcare AI — Treatment Recommendation Challenged
Situation: Your LLM-based clinical decision support tool recommends against a treatment. A patient's representative requests an explanation and challenges the decision.
What auditors ask:
- Can you show the exact inputs that produced this recommendation?
- Was a human clinician involved in the final decision?
- Is the model's output explainable in plain language?
- Was any PHI retained in your logs?
What governance infrastructure provides:
- Tamper-evident record of the exact input-output pair
- Human oversight log showing clinician review
- SHAP-based explanation of contributing factors
- PII/PHI audit showing what was redacted before logging
Scenario 2: Hiring AI — Discrimination Complaint
Situation: A rejected candidate files a discrimination complaint, alleging your AI hiring system is biased against older workers.
What auditors ask:
- What were the demographic characteristics of candidates screened?
- What is your model's false rejection rate by age group?
- Did you test for age bias before deployment?
- Have you monitored for age bias since deployment?
What governance infrastructure provides:
- Pre-deployment risk assessment documenting bias testing
- Monthly fairness audit logs showing demographic distribution
- Drift detection records showing any change in demographic patterns
- Evidence bundle mapping to EU AI Act Article 9
Scenario 3: Financial AI — Regulatory Investigation
Situation: A financial regulator opens an investigation into your credit scoring AI following a pattern of complaints.
What auditors ask:
- Show me your risk assessment before this system was deployed
- Show me your governance policy
- Show me 12 months of monitoring reports
- Can you prove records haven't been altered?
What governance infrastructure provides:
- Pre-deployment risk assessment (signed, dated)
- Governance policy document with version history
- 12 months of automated monitoring reports
- HMAC-signed audit trail proving tamper-evidence
Scenario 4: LLM Deployment — Training Data Leak
Situation: A security researcher demonstrates that your customer-facing LLM can be prompted to output PII from its training data.
What auditors ask:
- Did you test for training data memorization before deployment?
- Do you monitor for PII in outputs in production?
- How quickly did you detect and respond?
- What data was exposed, and to how many people?
What governance infrastructure provides:
- Pre-deployment testing records including memorization tests
- Production PII detection logs showing what was caught
- Incident response timeline with evidence of immediate action
- Scope assessment records showing exposure extent
Scenario 5: Third-Party AI — Vendor Failure
Situation: A third-party AI vendor you rely on suffers a breach. Your customers' data was processed by their systems.
What auditors ask:
- What due diligence did you conduct on this vendor?
- What does your Data Processing Agreement say?
- How quickly did you detect the breach?
- How did you notify affected customers?
What governance infrastructure provides:
- Vendor assessment records
- DPA documentation
- Data flow maps showing what went to which vendor
- Breach response timeline and notification records
Appendix A: Self-Assessment
Rate your current state for each capability:
| Capability | Current | Target |
|---|---|---|
| Risk Assessment | ☐ | Documented, dated, signed |
| Policy Enforcement | ☐ | 100% of deployments reviewed |
| Audit Trails | ☐ | Tamper-evident, signed, 7+ years |
| Drift Detection | ☐ | Automated, <4hr SLA |
| Framework Mapping | ☐ | Automatic, 6+ frameworks |
| Evidence Bundling | ☐ | <5 min auditor-ready bundle |
| Human Oversight | ☐ | Meaningful, documented, auditable |
| Incident Response | ☐ | Tested, SLA-bound, documented |
Scoring:
- Mostly unchecked: Maturity Level 1–2 (Reactive / Observability)
- Partially checked: Maturity Level 3 (Policy Enforcement)
- Mostly checked: Maturity Level 4–5 (Evidence Automation / Governance-Native)
Appendix B: Full References & Citations
Official Regulatory Texts
- EU AI Act (Regulation 2024/1689): https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689
- GDPR (Regulation 2016/679): https://gdpr-info.eu
- HIPAA (45 CFR Parts 160 and 164): https://www.hhs.gov/hipaa
- 45 CFR §164.514 — Safe Harbor De-Identification Standard
- HIPAA Breach Notification Rule — 45 CFR §164.400–414
NIST Publications
- NIST AI RMF 1.0 (NIST AI 100-1, 2023): https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
- NIST SP 800-92 — Guide to Computer Security Log Management: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-92.pdf
- NIST SP 800-188 — De-Identifying Government Datasets: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-188.pdf
- NIST SP 800-53 Rev 5 — Security and Privacy Controls for Information Systems
ISO Standards
- ISO 42001:2023 — Artificial Intelligence — Management system
- ISO/IEC 27001:2022 — Information security management systems
- ISO 31000:2018 — Risk management — Guidelines
- ISO/IEC 23894:2023 — Artificial Intelligence — Guidance on risk management
OpenTelemetry & Infrastructure Standards
- OpenTelemetry Specification: https://opentelemetry.io/docs/specs/otel/
- OpenTelemetry Protocol (OTLP): https://opentelemetry.io/docs/specs/otlp/
- OpenTelemetry Semantic Conventions for Gen AI: https://opentelemetry.io/docs/specs/semconv/gen-ai/
Cryptography & Supply Chain
- HMAC — RFC 2104: https://www.rfc-editor.org/rfc/rfc2104
- SLSA Framework (Supply-chain Levels for Software Artifacts): https://slsa.dev/
- CycloneDX (Software Bill of Materials): https://cyclonedx.org/
- SPDX (Software Package Data Exchange): https://spdx.dev/
Academic Papers
- Sculley, D., et al. (2015). Hidden Technical Debt in Machine Learning Systems. NeurIPS 2015. https://arxiv.org/abs/1505.00713
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. KDD 2016. https://arxiv.org/abs/1602.04938
- Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS 2017. https://arxiv.org/abs/1705.07874
- Lipton, Z. C. (2018). The Mythos of Model Interpretability. Queue, 16(3). https://dl.acm.org/doi/10.1145/3236386.3241340
- Carlini, N., et al. (2021). Extracting Training Data from Large Language Models. USENIX Security 2021. https://arxiv.org/abs/2012.07805
- Bommasani, R., et al. (2021). On the Opportunities and Risks of Foundation Models. Stanford CRFM. https://arxiv.org/abs/2108.07258
AI Safety & Security
- OWASP Top 10 for Large Language Models: https://owasp.org/www-project-top-10-for-large-language-model-applications/
- MITRE ATLAS (Adversarial Threat Landscape for AI Systems): https://atlas.mitre.org/
- AI Incident Database: https://incidentdatabase.ai/
Enterprise Governance Frameworks
- COSO Enterprise Risk Management (2017 Framework): https://www.coso.org/
- AICPA SOC 2 Trust Service Criteria (2022): https://www.aicpa.org/
- ISACA COBIT 2019: https://www.isaca.org/resources/cobit
Explainability & Fairness
- Mitchell, M., et al. (2019). Model Cards for Model Reporting. FAT* 2019. https://arxiv.org/abs/1810.03993
- IBM AI Fairness 360: https://aif360.mybluemix.net/
- Microsoft Responsible AI Standard: https://www.microsoft.com/en-us/ai/responsible-ai
SpanForge SDK: Cross-Framework Compliance in Practice
The SpanForge SDK provides a single, unified compliance infrastructure that covers all six frameworks in this guide. Rather than building separate systems for each regulation, the ComplianceMappingEngine maps your AI telemetry to the specific clauses of each framework — from one event stream.
Cross-Framework SDK Mapping
| Framework | SDK Framework Key | Core Clauses Covered | Primary Event Types |
|---|---|---|---|
| EU AI Act | eu_ai_act | Art. 13 (Transparency), Art. 14 (Human Oversight), Annex IV.5 (Technical Docs) | explanation.*, hitl.*, consent.*, llm.guard.* |
| GDPR | gdpr | Art. 22 (Automated Decisions), Art. 25 (Privacy by Design), Art. 17 (Erasure) | consent.*, hitl.*, llm.redact.* |
| HIPAA | hipaa | §164.312 (PHI Access Controls & Audit) | llm.redact.*, llm.audit.* |
| ISO 42001 | iso_42001 | A.5–A.10 (Full AI Management System controls) | Full event set |
| NIST AI RMF | nist_ai_rmf | MAP 1.1 (Risk Identification), GOVERN, MEASURE, MANAGE | llm.trace.*, llm.eval.*, model_registry.*, explanation.* |
| SOC 2 | soc2 | CC6.1 (Access Controls), CC7.2 (Monitoring), CC8.1 (Change Management) | llm.audit.*, llm.trace.*, model_registry.* |
Generating Evidence Packages for Any Framework
from spanforge.core.compliance_mapping import ComplianceMappingEngine
engine = ComplianceMappingEngine()
# Generate for any of the six frameworks with the same API
for framework in ["eu_ai_act", "gdpr", "hipaa", "iso_42001", "nist_ai_rmf", "soc2"]:
package = engine.generate_evidence_package(
model_id="your-model-id",
framework=framework,
from_date="2026-01-01",
to_date="2026-03-31",
)
print(f"{framework}: {package.gap_report}")
The Compliance Event Primitives
Every framework maps to the same underlying event types — this is the architectural insight that makes cross-framework compliance tractable:
| Event Category | Event Types | Frameworks Served |
|---|---|---|
| Consent | consent.granted, consent.revoked, consent.violation | GDPR Art. 22, EU AI Act Art. 14 |
| Human-in-the-Loop | hitl.queued, hitl.reviewed, hitl.escalated, hitl.timeout | EU AI Act Art. 14, GDPR Art. 22, NIST MANAGE |
| Explainability | explanation.generated | EU AI Act Art. 13, NIST MAP 1.1 |
| PII Redaction | llm.redact.* | GDPR Art. 25, HIPAA §164.312 |
| Audit Logging | llm.audit.* | SOC 2 CC6.1, HIPAA §164.312, ISO 42001 A.7 |
| Model Registry | model_registry.* | ISO 42001 A.5, NIST GOVERN, SOC 2 CC8.1 |
| Guardrails | llm.guard.* | EU AI Act Annex IV.5, NIST MANAGE |
SDK Reference: Compliance & Tenant Isolation · Evidence Export · Enterprise Integrations
Conclusion
You now have:
- Comprehensive framework knowledge across 6 regulatory standards (EU AI Act, GDPR, HIPAA, SOC 2, ISO 42001, NIST AI RMF)
- Technical architecture for governance-native AI across all three layers
- A maturity model to assess your current state and plan the path forward
- Seven failure modes to recognize and avoid in production
- Five scenarios showing what governance looks like under real regulatory scrutiny
- A self-assessment tool to prioritize your next 90 days
- Complete citations for all frameworks, standards, and research
Recommended Next Steps
- Complete Appendix A — score your current maturity for each capability
- Identify your top 2–3 failure modes — which ones apply to your current systems?
- Plan a 90-day roadmap — prioritize by highest regulatory risk and fastest path to Level 3
- Work through the scenarios — practice answering the auditor questions for your specific systems
- Teach one framework to others — the best way to solidify understanding is to explain it
This positions you at advanced practitioner competency — operating effectively across AI governance, a skill 95% of the field currently lacks.
AI Governance & Compliance Mastery — Complete Study Guide + Reference Brought to you by SpanForge | getspanforge.com | May 2026
Ready to move from understanding to implementation?