Compliance & Tenant Isolation

The spanforge.core.compliance_mapping module provides programmatic compliance tests that enterprise teams and third-party tool authors can run in CI pipelines, at deployment time, or as part of security audits — without requiring pytest.

Compatibility checklist (`test_compatibility`)

The five-point compatibility checklist verifies that a batch of events meets the spanforge v1.0 adoption requirements:

from spanforge.compliance import test_compatibility

result = test_compatibility(events)

if result.passed:
    print(f"All {result.events_checked} events are compatible.")
else:
    for v in result.violations:
        print(f"[{v.check_id}] {v.event_id}: {v.rule} — {v.detail}")

The five checks:

Check	Rule	Description
CHK-1	Required fields present	`schema_version`, `source`, and `payload` must be non-empty.
CHK-2	Event type registered or valid custom	Must be a first-party `EventType` value or pass `validate_custom`.
CHK-3	Source identifier format	Must match `^[a-z][a-z0-9-]@\d+\.\d+(\.\d+)?([.-][a-z0-9]+)$`.
CHK-5	Event ID is a valid ULID	`event_id` must be a well-formed 26-character ULID string.

Audit chain integrity (`verify_chain_integrity`)

Wraps verify_chain() with higher-level diagnostics:

from spanforge.core.compliance_mapping import verify_chain_integrity

result = verify_chain_integrity(
    events,
    org_secret="my-org-secret",
    check_monotonic_timestamps=True,   # default
)

print(f"Events verified: {result.events_verified}")
print(f"Gaps detected:   {result.gaps_detected}")

for v in result.violations:
    # v.violation_type: "tampered" | "gap" | "non_monotonic_timestamp"
    print(f"[{v.violation_type}] {v.event_id}: {v.detail}")

Multi-tenant isolation (`verify_tenant_isolation`)

Verify that events from two tenants share no org_id values and that each tenant's events are internally consistent:

from spanforge.core.compliance_mapping import verify_tenant_isolation

result = verify_tenant_isolation(
    tenant_a_events,
    tenant_b_events,
    strict=True,    # flag events missing org_id (default)
)

if not result:
    for v in result.violations:
        # v.violation_type: "missing_org_id" | "mixed_org_ids" | "shared_org_id"
        print(f"  {v.violation_type}: {v.detail}")

Scope verification (`verify_events_scoped`)

Assert that all events in a batch belong to an expected org/team:

from spanforge.core.compliance_mapping import verify_events_scoped

result = verify_events_scoped(
    events,
    expected_org_id="org_01HX",
    expected_team_id="team_engineering",
)

if not result:
    for v in result.violations:
        # v.violation_type: "wrong_org_id" | "wrong_team_id"
        print(f"  {v.event_id}: {v.detail}")

Using compliance results

All result objects are truthy on success and falsy on failure:

result = verify_tenant_isolation(a, b)
assert result, f"Isolation failed: {result.violations}"

# Or use in conditional logic:
if not result:
    notify_security_team(result.violations)

CI integration

# conftest.py or test_compliance.py
import pytest
from spanforge.compliance import test_compatibility

def test_all_events_compatible(captured_events):
    result = test_compatibility(captured_events)
    assert result, "\n".join(
        f"  [{v.check_id}] {v.event_id}: {v.detail}"
        for v in result.violations
    )

Regulatory compliance mapping (`ComplianceMappingEngine`)

The ComplianceMappingEngine maps spanforge telemetry events to clauses in regulatory frameworks and generates full evidence packages — including gap analysis and HMAC-signed attestations.

Supported frameworks

Framework	Key
EU AI Act	`eu_ai_act`
ISO 42001	`iso_42001`
NIST AI RMF	`nist_ai_rmf`
GDPR	`gdpr`
SOC 2	`soc2`

Generating an evidence package

from spanforge.core.compliance_mapping import ComplianceMappingEngine

engine = ComplianceMappingEngine()
package = engine.generate_evidence_package(
    model_id="gpt-4o",
    framework="eu_ai_act",
    from_date="2026-01-01",
    to_date="2026-03-31",
    audit_events=events,  # list of event dicts; omit to load from TraceStore
)

print(package.framework)        # "eu_ai_act"
print(package.model_id)         # "gpt-4o"
print(len(package.mappings))    # number of clause mappings
print(package.gap_report)       # gap analysis with coverage stats
print(package.attestation)      # HMAC-signed attestation dict

CLI usage

# Generate evidence package
spanforge compliance generate --model-id gpt-4o --framework eu_ai_act --from 2026-01-01 --to 2026-03-31 --events-file events.jsonl

# Run a compliance gate over an events file
spanforge compliance check --framework eu_ai_act --from 2026-01-01 --to 2026-03-31 --events-file events.jsonl

# Verify attestation signature only
spanforge compliance validate-attestation evidence.json

The attestation signature uses HMAC-SHA256 with the signing key from SPANFORGE_SIGNING_KEY (falls back to "spanforge-default" with a warning).

Clause-to-event-prefix mapping

The engine maps spanforge telemetry event prefixes to specific regulatory clauses. The following table summarises the key mappings:

Framework	Clause	Event prefixes	Description
GDPR	Art. 22	`consent.`, `hitl.`	Automated Individual Decision-Making — consent and oversight
GDPR	Art. 25	`llm.redact.`, `consent.`	Data Protection by Design
EU AI Act	Art. 13	`explanation.*`	Transparency — explainability of AI decisions
EU AI Act	Art. 14	`hitl.`, `consent.`	Human Oversight — HITL review and escalation
EU AI Act	Annex IV.5	`llm.guard.`, `llm.audit.`, `hitl.*`	Technical Documentation — safety and oversight
SOC 2	CC6.1	`llm.audit.`, `llm.trace.`, `model_registry.*`	Logical and Physical Access Controls
NIST AI RMF	MAP 1.1	`llm.trace.`, `llm.eval.`, `model_registry.`, `explanation.`	Risk identification and mapping

Model registry enrichment

When a model is registered in the ModelRegistry, the attestation is automatically enriched with metadata:

from spanforge.model_registry import ModelRegistry

registry = ModelRegistry()
registry.register("gpt-4o", owner="ml-team", risk_tier="high")

# Evidence packages now include:
# attestation.model_owner    → "ml-team"
# attestation.model_risk_tier → "high"
# attestation.model_status   → "active"
# attestation.model_warnings → []  (empty if active/registered)

Warnings are automatically emitted for:

Deprecated models — "model 'X' is deprecated"
Retired models — "model 'X' is retired"
Unregistered models — "model 'X' not found in registry"

Explanation coverage metric

The engine computes the percentage of decision events (llm.trace.* and hitl.*) that have corresponding explanation.* events:

package = engine.generate_evidence_package(
    model_id="gpt-4o",
    framework="eu_ai_act",
    from_date="2026-01-01",
    to_date="2026-03-31",
    audit_events=events,
)

print(package.attestation.explanation_coverage_pct)  # e.g. 75.0

This metric is also available via the /compliance/summary HTTP endpoint.

Regulation-specific PII compliance (Phase 3)

spanforge.sdk.pii exposes dedicated helpers for each major privacy regulation. All helpers are available on the sf_pii singleton:

from spanforge.sdk import sf_pii

GDPR — Article 17 (Right to Erasure)

When a data subject invokes their right to erasure ("right to be forgotten"), call erase_subject() to purge their PII from spanforge storage and receive a machine-readable receipt suitable for your compliance audit trail.

receipt = sf_pii.erase_subject(subject_id="user-42")
# ErasureReceipt
print(receipt.receipt_id)         # "erase-20260417-abc123"
print(receipt.subject_id_hash)    # SHA-256 of "user-42" (never the raw ID)
print(receipt.fields_erased)      # number of fields purged
print(receipt.timestamp)          # ISO 8601
print(receipt.audit_log_entry)    # dict ready for your audit log

Security note: The raw subject_id is never stored in the receipt. Only its SHA-256 digest is retained.

CCPA — Data Subject Access Request (DSAR)

California consumers may request a copy of their personal data held by your service. export_subject_data() collects all PII-tagged fields for a subject:

export = sf_pii.export_subject_data(subject_id="user-42")
# DSARExport
print(export.subject_id_hash)     # SHA-256 digest
print(export.record_count)        # number of records exported
for field in export.fields:
    print(field.field_path, field.pii_type, field.event_id)

Combine with erase_subject() to implement a full CCPA deletion-plus-export flow:

export = sf_pii.export_subject_data("user-42")
receipt = sf_pii.erase_subject("user-42")
store_compliance_record(export, receipt)

HIPAA — Safe Harbor De-identification

The 18 HIPAA Safe Harbor identifiers (name, dates, geographic data, phone, fax, email, SSN, etc.) are removed in a single call:

safe = sf_pii.safe_harbor_deidentify(
    {
        "name": "Alice Smith",
        "dob": "1980-01-15",
        "zip": "02139",
        "email": "alice@example.com",
        "ssn": "078-05-1120",
    }
)
# SafeHarborResult
print(safe.method)                  # "HIPAA_SAFE_HARBOR"
print(safe.original_field_count)    # 5
print(safe.redacted_field_count)    # 5
print(safe.redacted_record)         # dict with all 18 identifiers replaced

Training data audit — run safe harbor checks across a dataset before fine-tuning:

report = sf_pii.audit_training_data(training_records)
# TrainingDataPIIReport
print(report.pii_row_count)      # rows containing PII
print(report.total_row_count)    # total rows
print(report.risk_score)         # pii_row_count / total_row_count
print(report.entity_breakdown)   # {entity_type: count}

DPDP — India Digital Personal Data Protection Act (2023)

The DPDP Act requires consent before processing sensitive personal data. When a DPDP-regulated entity is detected without a consent record, the SDK raises SFPIIDPDPConsentMissingError:

from spanforge.sdk._exceptions import SFPIIDPDPConsentMissingError

try:
    result = sf_pii.scan_text("Aadhaar: 2950 7148 9635")
except SFPIIDPDPConsentMissingError as exc:
    # exc.subject_id_hash — SHA-256 of the subject ID
    # exc.entity_types    — ["IN_AADHAAR"]
    return consent_required_response(exc.entity_types)

DPDP entity types recognised:

Entity type	Description
`IN_AADHAAR`	12-digit UID (with/without spaces)
`IN_PAN`	Permanent Account Number (ABCDE1234F)
`IN_VOTER_ID`	Election Commission voter ID
`IN_PASSPORT`	Indian passport number
`IN_DRIVING_LICENSE`	Driving licence number

You can also use the low-level scan_payload() with DPDP_PATTERNS for backwards compatibility:

from spanforge import scan_payload, DPDP_PATTERNS

result = scan_payload(
    {"uid": "2950 7148 9635"},
    extra_patterns=DPDP_PATTERNS,
)

PIPL — China Personal Information Protection Law

China's PIPL defines sensitive categories of personal information. Scan for PIPL entities by passing language="zh":

result = sf_pii.scan_text("身份证: 110101199003077516", language="zh")
for entity in result.entities:
    print(entity.entity_type, entity.score)  # PIPL_NATIONAL_ID 0.98

PIPL entity types:

Entity type	Description
`PIPL_NATIONAL_ID`	18-digit Resident Identity Card number
`PIPL_PASSPORT`	Chinese passport number
`PIPL_MOBILE`	Chinese mainland mobile number (+86 …)
`PIPL_BANK_CARD`	Chinese bank card / UnionPay number
`PIPL_SOCIAL_CREDIT`	Unified Social Credit Code (企业)

EU AI Act Article 10 — Training Data Compliance Scanner

spanforge.sdk.dataset_scanner scans training datasets for EU AI Act Article 10 compliance before fine-tuning or deployment. It supports .jsonl, .json, .csv, .txt, and .parquet files and requires zero external dependencies.

Quickstart

from spanforge.sdk import scan_dataset_compliance

report = scan_dataset_compliance("./data/training/")
print(report.to_markdown())

Programmatic use

from spanforge.sdk import scan_dataset_compliance

report = scan_dataset_compliance("./data/training/", sign=True)

# Check every clause
for clause in report.eu_ai_act_article_10_clauses:
    status = "PASS" if clause.passed else "FAIL"
    print(f"[{status}] {clause.clause_id}  {clause.title}")
    if not clause.passed:
        print(f"       {clause.detail}")

# Gate on overall result
if not all(c.passed for c in report.eu_ai_act_article_10_clauses):
    raise SystemExit("Training data failed Article 10 checks — aborting fine-tune.")

Example output:

[PASS] Art.10(2)(a)  Data quality — PII density
[PASS] Art.10(2)(b)  Data governance — consent documentation
[FAIL] Art.10(2)(c)  Data collection — source provenance
       provenance_coverage_pct 62.0 < 80.0 threshold
[PASS] Art.10(2)(d)  Bias detection — vocabulary distribution

Article 10 clauses

Clause	What is checked	Pass condition
`Art.10(2)(a)`	PII entity density across the dataset	`pii_density_score < 1.0` per 1k tokens
`Art.10(2)(b)`	Rows that contain a consent field (`consent`, `consented`, `consent_given`, `opt_in`)	≥ 80 % of rows
`Art.10(2)(c)`	Rows that contain a provenance field (`source`, `origin`, `provenance`, `data_source`)	≥ 80 % of rows
`Art.10(2)(d)`	Vocabulary distribution skew (Zipf heuristic, ≥ 30 tokens required)	`bias_signal == "low"`

Preparing a compliant dataset

Add source and consent fields to every training record:

{"text": "The capital of France is Paris.", "label": "geography", "source": "wikipedia-2024-01", "consent": true}
{"text": "Photosynthesis converts sunlight into glucose.", "label": "science", "source": "textbook-bio-101", "consent": true}

CLI

# Markdown report (default)
spanforge compliance validate-dataset ./data/training/

# JSON report — pipe into jq / CI systems
spanforge compliance validate-dataset ./data/training/ --output json | jq '.eu_ai_act_article_10_clauses[] | select(.passed == false)'

# Skip signing (useful for read-only CI environments)
spanforge compliance validate-dataset ./data/training/ --no-sign

# Via the validate command
spanforge validate --dataset ./data/training/ --output report

Exit code 0 = all clauses pass; 1 = one or more clauses fail.

HMAC signature verification

Every report is signed with HMAC-SHA256 over its canonical JSON body. Set SPANFORGE_SIGNING_KEY to a strong secret before running in production:

export SPANFORGE_SIGNING_KEY=$(openssl rand -hex 32)
spanforge compliance validate-dataset ./data/training/ --output json > report.json

Verify the signature independently:

import hmac, hashlib, json, os

with open("report.json") as f:
    data = json.load(f)

signature = data.pop("hmac_signature")
body = json.dumps(data, sort_keys=True).encode()
key = os.environ["SPANFORGE_SIGNING_KEY"].encode()
expected = "hmac-sha256:" + hmac.new(key, body, hashlib.sha256).hexdigest()
assert signature == expected, "Report has been tampered with!"

CI gate example (GitHub Actions)

- name: Article 10 compliance gate
  run: spanforge compliance validate-dataset ./data/training/ --output json
  env:
    SPANFORGE_SIGNING_KEY: ${{ secrets.SPANFORGE_SIGNING_KEY }}

The step fails (non-zero exit) automatically when any Article 10 clause fails, blocking the workflow before fine-tuning begins.

Compliance & Tenant Isolation

Compatibility checklist (`test_compatibility`)

Audit chain integrity (`verify_chain_integrity`)

Multi-tenant isolation (`verify_tenant_isolation`)

Scope verification (`verify_events_scoped`)

Using compliance results

CI integration

Regulatory compliance mapping (`ComplianceMappingEngine`)

Supported frameworks

Generating an evidence package

CLI usage

Clause-to-event-prefix mapping

Model registry enrichment

Explanation coverage metric

Regulation-specific PII compliance (Phase 3)

GDPR — Article 17 (Right to Erasure)

CCPA — Data Subject Access Request (DSAR)

HIPAA — Safe Harbor De-identification

DPDP — India Digital Personal Data Protection Act (2023)

PIPL — China Personal Information Protection Law

See also

EU AI Act Article 10 — Training Data Compliance Scanner

Quickstart

Programmatic use

Article 10 clauses

Preparing a compliant dataset

CLI

HMAC signature verification

CI gate example (GitHub Actions)

See also

Compliance & Tenant Isolation

Compatibility checklist (test_compatibility)

Audit chain integrity (verify_chain_integrity)

Multi-tenant isolation (verify_tenant_isolation)

Scope verification (verify_events_scoped)

Using compliance results

CI integration

Regulatory compliance mapping (ComplianceMappingEngine)

Supported frameworks

Generating an evidence package

CLI usage

Clause-to-event-prefix mapping

Model registry enrichment

Explanation coverage metric

Regulation-specific PII compliance (Phase 3)

GDPR — Article 17 (Right to Erasure)

CCPA — Data Subject Access Request (DSAR)

HIPAA — Safe Harbor De-identification

DPDP — India Digital Personal Data Protection Act (2023)

PIPL — China Personal Information Protection Law

See also

EU AI Act Article 10 — Training Data Compliance Scanner

Quickstart

Programmatic use

Article 10 clauses

Preparing a compliant dataset

CLI

HMAC signature verification

CI gate example (GitHub Actions)

See also

Compatibility checklist (`test_compatibility`)

Audit chain integrity (`verify_chain_integrity`)

Multi-tenant isolation (`verify_tenant_isolation`)

Scope verification (`verify_events_scoped`)

Regulatory compliance mapping (`ComplianceMappingEngine`)