Runtime Governance GA Guide
SpanForge GA is centered on one operational story:
runtime request -> policy decision -> signed evidence -> operator review -> export package
This guide ties together the Phase 1 through Phase 6 runtime-governance surface so platform, security, and compliance teams can understand the product as one control plane instead of a set of disconnected SDK clients.
For the adjacent contract, calibration, and export details, see:
- Runtime Governance Contracts
- Replay, Simulation, and Calibration
- Evidence Export Guide
- Enterprise Integrations
GA Services
The May 2, 2026 GA runtime-governance surface consists of five core services, with Phase 1B/1C production-hardening applied to sf_explain, sf_scope, sf_rbac, and sf_validate (v1.0.1):
| Service | Purpose | Primary SDK | v1.0.1 additions |
|---|---|---|---|
| Explainability | Generate accountable "why" records for runtime decisions | sf_explain | ExplainModelType enum, retry + timeout emit |
| Scope enforcement | Prevent agents from using capabilities or resources outside their manifest | sf_scope | ACTION_CATEGORIES, circuit-breaker fail-secure |
| RBAC enforcement | Check actor roles before sensitive actions | sf_rbac | STANDARD_ROLE_MATRIX, YAML + JWT registration |
| RAG grounding | Score answer grounding and record source-level evidence | sf_rag | — |
| Lineage | Capture provenance across transformation and decision boundaries | sf_lineage | — |
sf_validate surfaces are also available as governance input:
| Validate feature | v1.0.1 additions |
|---|---|
| Event validation | EnforcementMode (STRICT / LENIENT / WARN / CORRECT), ValidationResult, enforce_event() |
| HMAC signing | sign_event_hmac() — HMAC-SHA256 event signing |
| Dataset compliance | scan_dataset_compliance(), Article10Clause, DatasetComplianceReport — EU AI Act Article 10 scanner; report verifiable via spanforge audit check-health |
These services are coordinated by:
sf_policyfor policy loading, activation, replay, simulation, promotion, and review loopssf_operatorfor the operator-facing trace inspection and export workflowsf_enterprisefor deployment posture, retention/export controls, and enterprise evidence packaging
Runtime Policy Model
Runtime policy bundles are versioned per environment:
from spanforge.runtime_policy import RuntimePolicyBundle, RuntimePolicyRule
bundle = RuntimePolicyBundle(
policy_id="prod-governance",
version="2026.05.02",
environment="prod",
owner="platform-security",
effective_at="2026-05-02T00:00:00Z",
rules=[
RuntimePolicyRule(
rule_id="rag-grounding-prod",
service="sf_rag",
control="grounding_threshold",
action="human_review",
threshold=0.85,
rationale="Escalate low-grounding answers before delivery.",
metadata={"comparator": "lt"},
),
],
)
Supported runtime actions:
allowallow+logredactblockhuman_review
Those actions are emitted as signed policy decisions and then attached to explanation, scope, RBAC, grounding, and lineage evidence.
Operator Workflow
The operator path is intentionally narrow:
- Inspect a trace or run.
- Review the policy decision and its explanation.
- Check grounding, scope, RBAC, and lineage evidence in one place.
- Export a signed evidence package from the same path.
CLI:
spanforge operator inspect TRACE_ID
spanforge operator export TRACE_ID --output operator-package.json
SDK:
from spanforge.sdk import sf_operator
workflow = sf_operator.inspect_trace("trace-123")
package = sf_operator.export_package("trace-123", output_path="operator-package.json")
The workflow summary is built to answer the operator question directly:
- Why was this request allowed?
- Why was it blocked?
- Which controls contributed?
- Is the signed evidence chain valid?
Replay, Simulation, and Calibration
sf_policy separates production decisions from candidate-policy testing:
evaluate()records live production decisionssimulate()runs a candidate bundle without changing production behaviorreplay()pushes historical events through a candidate bundlecompare_policies()summarizes action changes between baseline and candidate bundlesrecord_review()captures false-positive and tuning feedbacksuggest_threshold()derives tuning hints from reviewed decisions
This lets teams test changes such as:
- tightening a grounding threshold
- switching a failed scope check from
human_reviewtoblock - moving explainability coverage from
allow+logtohuman_review
without changing live traffic first.
See the dedicated Replay, Simulation, and Calibration guide for the full Phase 3 workflow.
Evidence Exports
SpanForge now has two export layers:
| Export | Producer | Intended audience |
|---|---|---|
| Operator evidence package | sf_operator.export_package() | operators, incident responders, control owners |
| Enterprise evidence package | sf_enterprise.generate_evidence_package() | auditors, security, platform review boards |
The enterprise package wraps:
- deployment profile
- retention and export controls
- enterprise status
- operator workflow package
- reference deployment architectures
- signed checksum and signature
See the dedicated Evidence Export Guide for operator package, enterprise package, JSONL, SIEM, and OpenInference coverage.
Deployment Credibility
The runtime-governance story assumes real enterprise deployment constraints:
- self-hosted deployment
- air-gapped deployment
- tenant and project isolation
- retention and export controls
- reference architectures for Docker Compose and Kubernetes
See:
Demo Paths
Two runnable demos ship with the repo for the Phase 7 story:
The matching scripts live in examples/ and are intended to be executable from a clean local checkout.
Ready to instrument your AI pipeline?