Changelog
All notable changes to spanforge are documented here. The format follows Keep a Changelog and this project adheres to Semantic Versioning.
[2.0.14] — Unreleased
F-series + Compliance value hardening: Async SDK, RAG Auto-instrumentation, Feedback Endpoint, Gate Coverage, Batch Exporter Tests, Compliance Readiness
Added — Compliance value hardening (session 3)
- Remediation guidance —
_FRAMEWORK_CLAUSESnow includesremediation_stepsfor every clause across all six frameworks (SOC 2, HIPAA, GDPR, NIST AI RMF, EU AI Act, ISO 42001). Gap reports render each step as> **Fix**: <steps>in the Markdown output. - Markdown reports —
ComplianceEvidencePackage.to_markdown()method added.spanforge compliance reportnow accepts--format markdown(writes<prefix>_report.md) and--format both(writes both JSON and Markdown in one pass). spanforge compliance readinesscommand — scored pre-production checklist for any supported framework. Checks signing key, event store, evidence package generation, gap count, and attestation. Exits 0 (all pass), 1 (failures present), or 2 (unknown framework).- Live compliance posture in
spanforge doctor— after the PII Engine section, doctor now queries the event store, runsgenerate_evidence_package(framework="eu_ai_act"), and prints passing/total clause count with a list of gap/partial clauses.
Tests — Compliance hardening
TestRemediationSteps— verifies every clause in every framework has a non-emptyremediation_stepsstring (length > 20).test_gap_report_text_contains_remediation— asserts> **Fix**:appears in gap report output.test_to_markdown_returns_report_text— assertsto_markdown()is identical toreport_textand contains# spanforge Compliance Report.TestCmdReadiness— smoke tests for exit codes 0/1/2, all-framework acceptance, and signing-key environment variable check.test_cmd_report_markdown_format,test_cmd_report_both_format_writes_json_and_markdown— coverage for the two new--formatmodes.test_compliance_readiness_registered_in_parser,test_dispatch_routes_readiness— CLI wiring tests.- Full suite after this session: 6 109 passed, 14 skipped, 0 failed. ruff 0 · mypy 0 · bandit 0.
Added — Phase 7 documentation and demos
- Added a new runtime governance GA guide that consolidates the Phase 1 through Phase 6 runtime-governance story into one operator and buyer-facing control-plane narrative.
- Added dedicated Phase 0/3/5/7 alignment pages:
docs/runtime-governance-contracts.mddocs/replay-simulation.mddocs/evidence-export.mddocs/enterprise-integrations.mddocs/competitor-comparison.mddocs/ga-release-notes.md
- Added focused API documentation for:
spanforge.sdk.explainspanforge.sdk.policyspanforge.sdk.scopespanforge.sdk.rbacspanforge.sdk.lineagespanforge.sdk.operator
- Expanded API reference coverage for:
spanforge.integrations.azure_openaispanforge.integrations.langgraphspanforge.export.openinferencespanforge.export.siem_schema
Added — Phase 8 release hardening
- Added
tests/test_phase8_release_hardening.pyto lock the GA release gate around:- end-to-end trace-to-enterprise-evidence workflow verification
- enforceability of all five runtime policy actions
- malformed runtime policy input validation
- incomplete replay/comparison event rejection
- degraded-mode RAG behavior when the local observe path times out
Fixed — runtime governance hardening
spanforge.runtime_policynow raises clearValueErrormessages for malformed bundle and rule dictionaries instead of leaking raw key errors.spanforge.sdk.policynow validates historical replay and comparison events before evaluation, producing clearer failures for incomplete or mismatched payloads.spanforge.sdk.ragnow normalizes internaltimeoutsession states to a schema-valid session summary status duringget_session()andend_session().- Added two runnable Phase 7 demo scripts:
examples/runtime_governance_demo.pyexamples/enterprise_evidence_demo.py
- Added matching walkthrough docs:
docs/demos/runtime-governance-demo.mddocs/demos/enterprise-evidence-demo.md
- Added
docs/reference-architectures.mdto centralize the self-hosted, Kubernetes, and air-gapped reference deployment artifacts surfaced by enterprise evidence packaging. - Updated
README.md,docs/index.md,docs/quickstart.md,docs/api/index.md,docs/user_guide/index.md, anddocs/api/enterprise.mdso the runtime-governance and enterprise evidence paths are discoverable from the main entrypoints.
Changed — CLI modularization, package-root cleanup, and guardrails
- Split the large CLI router into focused command modules:
spanforge._cli_auditspanforge._cli_costspanforge._cli_opsspanforge._cli_phase11
- Reduced top-level package import coupling in
spanforge.__init__by moving module-style and selected grouped exports behind lazy resolution. - Added explicit CI drift guardrails in
.github/workflows/ci.ymlplustests/test_repo_guardrails.pyto fail fast on:spanforge.__version__/pyproject.tomlmismatch- stale known-bad documentation patterns
- documented CLI entrypoints that no longer parse
Fixed
spanforge.normalizer.GenericNormalizernow sets a validModelInfo.custom_system_namewhen returning_custommodel-system metadata.
Tests
- Added direct unit coverage for extracted CLI modules, especially
spanforge._cli_compliance, so module-level coverage reflects the post-refactor architecture instead of relying only on router integration tests. - Added focused deep-coverage suites for:
spanforge.sdk.pipelinesspanforge.gatespanforge.egressspanforge.normalizer
- Full suite status after these changes:
6 001 passed,14 skipped- overall coverage:
91.72%
Added — spanforge.auto — RAG Auto-instrumentation (F-20)
trace_rag(func)decorator — wraps any retrieval callable; callssf_rag.trace_query()before andsf_rag.trace_retrieval()after; fail-safe (never raises)._patch_rag_llama_index()— monkey-patchesllama_index.core.retrievers.VectorIndexRetriever.retrieveto emit RAG spans automatically when LlamaIndex is installed._patch_rag_langchain()— monkey-patcheslangchain_core.retrievers.BaseRetriever.invoketo emit RAG spans automatically when LangChain Core is installed.setup()extended — now calls both RAG patches after LLM patches; returns set now includes"llama_index:rag"/"langchain_core:rag"entries.teardown()extended — calls_unpatch_rag_llama_index()and_unpatch_rag_langchain()to cleanly restore original methods.__all__updated to exporttrace_rag.- All RAG import calls wrapped in
warnings.catch_warnings(simplefilter("ignore"))to suppress NumPy reloadRuntimeWarningin test environments.
Added — Async SDK methods (F-10)
SFPIIClient.scan_async(text, *, language="en", score_threshold=0.5)— non-blocking async PII scan viarun_in_executor.SFGateClient.evaluate_async(gate_id, payload, *, project_id="", pipeline_id="")— non-blocking async gate evaluation.SFCECClient.build_bundle_async(project_id, date_range, frameworks=None)— non-blocking async CEC bundle build.SFTrustClient.get_scorecard_async(project_id=None, *, from_dt=None, to_dt=None, weights=None)— non-blocking async T.R.U.S.T. scorecard computation.SFIdentityClient.sso_delegate_session_async(idp_session_id, subject, *, email="", project_id="default")— non-blocking async SSO session delegation.- All five async methods use
asyncio.get_event_loop().run_in_executor(None, functools.partial(...)).
Added — spanforge._server — POST /v1/feedback (F-21)
POST /v1/feedbackroute added to the embedded HTTP server.- Accepts:
session_id,trace_id,rating(required);comment,user_id,source,metadata,linked_trust_dimension(optional). - Returns:
{"feedback_id": "<ulid>", "accepted": true}with HTTP 201. - Delegates to
sf_feedback.submit().
Added — docs/api/drift.md
- Note callout added after the
BehaviouralBaselineconstructor description: LLM-only warning thattokensandconfidence_by_typefields are only populated whendetector_type="llm".
Tests
tests/test_coverage_gaps.py :: TestGateExecutorSubprocessMocks— 27 new tests covering all 6 built-insubprocess.rungate executors with dedicated mocks (F-42):_exec_schema_validation(4): command PASS, command FAIL (with/without stderr), generic exception._exec_dependency_security(6): PASS (no vulns), FAIL (critical CVEs parsed), JSON parse error, timeout, generic exception, custom command._exec_secrets_scan(4): secrets detected FAIL, fallback to unstaged diff, ImportError, generic exception._exec_performance_regression(3): command PASS, command FAIL, generic exception._exec_halluccheck_prri(4): command + artifact combo, timeout, malformed JSON, generic exception._exec_halluccheck_trust(6): SDK PASS, SDK FAIL, artifact PASS, artifact FAIL, malformed artifact, no-SDK-no-artifact WARN.
tests/test_batch_exporter.py— 17 new tests across 5 classes (TestBatchExporterPutFlush,TestBatchExporterShutdown,TestBatchExporterQueueFull,TestBatchExporterCircuitBreaker,TestBatchExporterHealth). Covers put/flush, shutdown, queue-full, circuit breaker open/half-open/reset, and aggregate health (F-45).tests/test_prompt_registry.py— 19 new tests across 6 classes (TestPromptVersion,TestRegister,TestGetList,TestRender,TestModuleFunctions,TestThreadSafety). Covers version dataclass, registry CRUD, template rendering, module-level helpers, and thread-safe concurrent registration (F-45).tests/test_sf_gate.py— 22 new tests:TestInferVerdictBranches(15) for every_infer_verdictbranch;TestPostEvaluateHooksSideEffects(7) for hook execution, hook failure isolation, multi-hook ordering, and metrics access (F-42).tests/test_sf13.py— 3 new test classes:TestSF13D(WORM upload on rotation),TestSF13E(rotation-by-size + no-rotation-when-zero),TestSF13F(append_batch count/content/empty) (F-41).- Total: 5 863 passed, 14 skipped, 0 failed. Coverage: 91.08 %.
Fixed
tests/test_sf_gate.py— removed straytimestampkwarg fromGateEvaluationResult()frozen-dataclass constructor inTestPostEvaluateHooksSideEffects.setUp.tests/test_auto.py— replaced_rag_libswholesale deletion fixture with save/restore ofsys.modulesentries (_rag_saved) so RAG lib blocking works without triggering NumPy reloadRuntimeWarningcascade.src/spanforge/auto.py— reverted erroneous_RAG_PATCHED.clear()call that caused repeated LlamaIndex reimports and NumPy reload warnings.
[2.0.13] — Unreleased
Phase 1 SSO completion + Phase 5 CEC endpoint additions
Added — spanforge.sdk.identity — Full SSO Suite (ID-040–ID-043)
- SAML 2.0 ACS (ID-040) —
saml_acs()now processes real SAMLResponse payloads in local mode. Decodes base64 XML, extractsNameID, and issues a SpanForge session JWT.saml_metadata()now returns a complete SP metadata XML with AssertionConsumerService binding. - SCIM 2.0 User provisioning (ID-041) — Full CRUD:
scim_create_user(),scim_get_user(),scim_list_users()(witheqfilter),scim_patch_user()(active/displayName/emails),scim_delete_user(). Thread-safe in-memory store withuserNameuniqueness enforcement. - SCIM 2.0 Group provisioning (ID-041) —
scim_create_group(),scim_list_groups(),scim_delete_group(). Membership is maintained bidirectionally between user and group records. - OIDC relying party (ID-042) —
oidc_authorize()generates a PKCE authorization URL (RFC 7636, S256) withstate,nonce,code_verifier,code_challenge.oidc_callback()validates the CSRF state, enforces a 10-minute state TTL, and issues a SpanForge session JWT. - SSO session delegation (ID-043) —
sso_delegate_session()creates a SpanForge session bound to an IdP session id.sso_revoke_idp_session()propagates IdP-side revocation.sso_get_session()retrieves session state. - New types —
SCIMUser,SCIMGroup,SCIMListResponse,OIDCAuthRequest,OIDCTokenResult,SSOSessionadded tospanforge.sdk._typesand exported fromspanforge.sdk.
Added — spanforge.sdk.cec — Bundle Registry & URL Re-issue (CEC-003, CEC-004)
build_bundle()now registers results in an in-memory_bundle_registrykeyed bybundle_id.get_bundle(bundle_id)— Retrieves a previously-built bundle from the session registry; returnsNoneif not found.reissue_download_url(bundle_id)— Re-issues a freshexpires_at(+24 h) for an existing bundle without rebuilding the ZIP. RaisesSFCECBuildErrorif the bundle is unknown or the ZIP was deleted.
Added — spanforge._server — New REST Endpoints (CEC-003, CEC-004)
POST /v1/risk/cec— Builds a CEC compliance evidence bundle. Acceptsproject_id,date_range, andframeworksin the JSON body. Returnsbundle_id,download_url,expires_at,hmac_manifest,record_counts. HTTP 201.GET /v1/risk/cec/{bundle_id}— Re-issues a fresh download URL for an existing bundle. Returns 404 if thebundle_idis not in the session registry.
Tests
- 38 new SSO tests across
TestSAMLStub,TestSCIMUsers,TestSCIMGroups,TestOIDC,TestSSOSessionDelegation,TestReissueDownloadUrl. - 13 new chaos tests in
tests/chaos/test_service_unavailability.py(DX-023): PII/secrets/audit/observe/identity fallback, no-secrets-in-logs assertions, network-partition simulation. - Total: 5 766 passed, 14 skipped. Coverage: 90.01 %.
[2.0.12] — Unreleased
Phase 13: RAG Tracing, User Feedback, Async SDK & LangSmith Migration
Added — spanforge.sdk.rag — RAG Tracing Client (Phase 13)
SFRAGClient— End-to-end tracing for Retrieval-Augmented Generation pipelines. Six methods:trace_query(),trace_retrieval(),trace_generation(),end_session(),get_session(),get_status().RAGStatusInfo— Health DTO withhealthy,version,mode,service,total_sessions,active_sessions._RAGSession— Internal per-session accumulator (query span, retrieved chunks, generation span).- Privacy controls —
include_content=Falsedrops raw text;include_metadata=Falsedrops chunk metadata;include_query_text=Falseanonymises the query. - Grounding scores —
grounding_score(0–1) ontrace_generation()measures answer fidelity against retrieved chunks. - T.R.U.S.T. integration — Retrieval quality + grounding scores feed the Reliability pillar of the T.R.U.S.T. scorecard.
Added — spanforge.sdk.feedback — User Feedback Client (Phase 13)
SFFeedbackClient— Collects, queries, and aggregates user-facing feedback on LLM responses. Five methods:submit(),get_feedback(),get_summary(),link_to_trust(),get_status().FeedbackStatusInfo— Health DTO withhealthy,version,mode,service,total_submissions,linked_trust_sessions.FeedbackRatingenum (13 values) —THUMBS_UP,THUMBS_DOWN,STAR_1–STAR_5,NPS_0–NPS_10(anchors only),CSAT_1–CSAT_5.FeedbackSubmittedPayload/FeedbackSummaryPayload— Namespace payload dataclasses inspanforge.namespaces.feedback.
Added — spanforge.namespaces.retrieval — RAG Namespace Payloads (Phase 13)
- 5 dataclasses:
RetrievedChunk,RetrievalQueryPayload,RetrievalResultPayload,RAGSpanPayload,RAGSessionPayload.
Added — spanforge.namespaces.feedback — Feedback Namespace Payloads (Phase 13)
FeedbackRatingenum withnumeric_value()helper.FeedbackSubmittedPayloadandFeedbackSummaryPayloaddataclasses.
Added — Async SDK methods (F-10)
SecretsScanner.scan_async()— Non-blocking async variant ofscan()usingasyncio.to_thread.SFAuditClient.append_async()— Async append for use in async application code.SFObserveClient.emit_span_async()— Async span emission.SFAlertClient.publish_async()— Async alert publishing.
Added — migrate_from_langsmith() (F-27)
spanforge.migrate.migrate_from_langsmith(runs, *, source="langsmith-import")— Converts a list of LangSmith run dicts to SpanForge v2 event dicts. Mapsrun_type→event_typeusing a 4-entry lookup table. Stores only input/output key names (not raw values) for privacy. Truncateserrorfields to 500 characters.- CLI:
spanforge migrate-langsmithcommand added.
Fixed — DriftDetector constructor parameters (F-09)
- Renamed
zscore_threshold→z_threshold. - Removed non-existent
circuit_breaker_reset_secondsparameter. - Added
window_seconds(int, default 3600),auto_emit(bool, default True),metric_ttl_seconds(int, default 86400). - Corrected default
window_sizefrom 100 → 500.
Quality gates
- 70 new tests — RAG client, feedback client, async methods, LangSmith migration, DriftDetector params.
- 5 715 total (12 skipped) — full regression pass, zero failures.
- Coverage: 90.01% overall.
- ruff clean.
[Unreleased] — Cross-cutting additions
Added — spanforge.export.siem_splunk — Splunk HEC Exporter
SplunkHECExporter— Thread-safe, batched exporter for Splunk HTTP Event Collector. Buffers events to configurablebatch_size(default 50) then flushes in a single HEC HTTP request. SSL verification, configurable index / source / sourcetype, context-manager support. All parameters fall back toSPANFORGE_SPLUNK_*environment variables.SplunkHECError— Raised on permanent HEC delivery failures (HTTP errors, URL parse failure, network errors).- Security — HEC token is excluded from
repr()and log output. HTTP to non-localhost addresses emits aWARNING.verify_ssl=Falseis supported for controlled lab environments only. - Env vars added:
SPANFORGE_SPLUNK_HEC_URL,SPANFORGE_SPLUNK_HEC_TOKEN,SPANFORGE_SPLUNK_INDEX,SPANFORGE_SPLUNK_SOURCE,SPANFORGE_SPLUNK_SOURCETYPE,SPANFORGE_SPLUNK_BATCH_SIZE,SPANFORGE_SPLUNK_TIMEOUT.
Added — spanforge.export.siem_syslog — Syslog / CEF Exporter
SyslogExporter— Forwards events to a remote syslog receiver via UDP (default) or TCP. Supports RFC 5424 and ArcSight Common Event Format (CEF) encoding. Severity derived fromevent_typeprefix (error→3,warn→4,info→6,debug/trace→7). Facility configurable (0–23, default16= local0). All parameters fall back toSPANFORGE_SYSLOG_*environment variables.SyslogExporterError— WrapsOSErrorfrom socket failures.- CEF — Escapes
\,|,=in extension values. Extension fields includeevent_id,event_type,source,ts, and JSON-serialisedpayload. - Env vars added:
SPANFORGE_SYSLOG_HOST,SPANFORGE_SYSLOG_PORT,SPANFORGE_SYSLOG_TRANSPORT,SPANFORGE_SYSLOG_FORMAT,SPANFORGE_SYSLOG_APP_NAME,SPANFORGE_SYSLOG_FACILITY.
Fixed — spanforge.migrate — v1_to_v2 empty tags
- Bug:
v1_to_v2()returnedtags=Nonewhen a v1 event had no tags, causing downstream code to fail onTagsattribute access. Now always returnsTags(**raw_tags)(an emptyTags()object when no tags are present).
Quality gates
- 223 new tests — comprehensive coverage of
siem_splunk,siem_syslog,cache,deprecations,governance, andlintmodules (all new test files). - 5 645 total (12 skipped) — full regression pass, zero failures.
- Coverage: 90.15% overall; all new modules at high branch + line coverage.
- ruff clean, mypy strict clean, bandit clean.
2.0.11 — Unreleased
Phase 12: Developer Experience & Ecosystem
Added — spanforge.testing_mocks (Phase 12)
MockSFIdentity(DX-003) — Mock forSFIdentityClient. All 18 methods:issue_api_key(),rotate_api_key(),revoke_api_key(),validate_api_key(),create_session(),validate_session(),revoke_session(),issue_magic_link(),exchange_magic_link(),enroll_totp(),verify_totp(),generate_backup_codes(),set_ip_allowlist(),check_rate_limit(),get_status(), plushealthyproperty.MockSFPII(DX-003) — Mock forSFPIIClient. 14 methods:scan_text(),anonymise(),scan_batch(),apply_pipeline_action(),get_status(),erase_subject(),export_subject_data(),safe_harbor_deidentify(),audit_training_data(),get_pii_stats(), plushealthyproperty.MockSFSecrets(DX-003) — Mock forSFSecretsClient.scan(),scan_batch(),get_status().MockSFAudit(DX-003) — Mock forSFAuditClient.append(),query(),verify_chain(),export(),sign(),get_status(),get_trust_scorecard(),generate_article30_record().MockSFObserve(DX-003) — Mock forSFObserveClient.emit_span(),export_spans(),add_annotation(),get_annotations(),get_status(), plushealthy/last_export_atproperties.MockSFGate(DX-003) — Mock forSFGateClient.evaluate(),evaluate_prri(),run_pipeline(),get_artifact(),list_artifacts(),purge_artifacts(),get_status(),configure().MockSFCEC(DX-003) — Mock forSFCECClient.build_bundle(),verify_bundle(),generate_dpa(),get_status().MockSFAlert(DX-003) — Mock forSFAlertClient.publish(),acknowledge(),register_topic(),set_maintenance_window(),remove_maintenance_windows(),get_alert_history(),get_status(), plushealthyproperty.MockSFTrust(DX-003) — Mock forSFTrustClient.get_scorecard(),get_badge(),get_history(),get_status().MockSFEnterprise(DX-003) — Mock forSFEnterpriseClient. All 15 methods includingregister_tenant(),get_isolation_scope(),configure_encryption(),encrypt_payload(),decrypt_payload(),configure_airgap(),assert_network_allowed(),check_all_services_health(),get_status(), plushealthyproperty.MockSFSecurity(DX-003) — Mock forSFSecurityClient.run_owasp_audit(),add_threat(),generate_default_threat_model(),scan_dependencies(),run_static_analysis(),audit_logs_for_secrets(),run_full_scan(),get_status(), plushealthyproperty._MockBase(DX-003) — Base class with.callsrecording list,.configure_response(method, value)for overriding default returns, and._record(method, kwargs)/._resolve(method, default)helpers.mock_all_services()(DX-003) — Context manager that patches all 11sf_*singletons inspanforge.sdkwith their mock counterparts. Restores originals on exit.
Added — Sandbox mode (Phase 12)
SFServiceClient._is_sandbox()(DX-004) — All service clients check[spanforge] sandbox = trueconfig to route calls to local in-memory sandbox mode. No network calls, no quota consumption.
Added — CLI (Phase 12)
spanforge doctor(DX-005) — Full environment diagnostic: config validation, service reachability, API key expiry check, PII/secrets pattern loading, gate YAML validation. Coloured pass/fail output.
Fixed — Mock default return values (Phase 12)
- Corrected 8 mock default return value constructors to match actual
_types.pydataclass signatures:ExportResult,ObserveStatusInfo,PRRIResult,GateStatusInfo,DPADocument,CECStatusInfo,AlertStatusInfo,EnterpriseStatusInfo.
Fixed — Integration tests (Phase 12)
test_audit_sign_and_verify— Changed fromsign()+verify_chain()toappend()→export()→verify_chain()workflow to match the API contract.test_observe_export_spans— Fixed assertion fromExportResult.successtoExportResult.exported_count >= 1.test_alert_publish_and_deduplicate— Fixed assertion fromPublishResult.topictonot PublishResult.suppressed.
Quality gates (Phase 12)
- 130 new tests — comprehensive coverage of all 11 mock service clients (every method tested), sandbox mode, doctor CLI, integration workflows
- 5 351 total (12 skipped) — full regression pass, zero failures
- Coverage:
testing_mocks.py100% line + 100% branch - ruff clean, mypy strict clean, bandit clean
2.0.10 — Unreleased
Phase 11: Enterprise Hardening & Supply Chain Security
Added — spanforge.sdk.enterprise (Phase 11)
SFEnterpriseClient.register_tenant(project_id, org_id, *, data_residency, cross_project_read, allowed_project_ids) → TenantConfig(ENT-001) — Registers a project with namespace isolation, unique HMAC chain secret, and data residency configuration.SFEnterpriseClient.get_isolation_scope(project_id) → IsolationScope(ENT-002) — Returns the(org_id, project_id)composite key for namespace scoping.SFEnterpriseClient.check_cross_project_access(source, targets)(ENT-001) — Validates cross-project read access against the allow-list.SFEnterpriseClient.enforce_data_residency(project_id, target_region)(ENT-004) — Blocks data from leaving the configured region.SFEnterpriseClient.configure_encryption(*, encrypt_at_rest, kms_provider, mtls_enabled, fips_mode) → EncryptionConfig(ENT-010–013) — AES-256-GCM at rest, envelope encryption via cloud KMS, mTLS, FIPS 140-2 mode.SFEnterpriseClient.encrypt_payload(plaintext, key) → dict/decrypt_payload(...) → bytes(ENT-010) — AES-256-GCM encrypt/decrypt with HMAC tag verification.SFEnterpriseClient.configure_airgap(*, offline, self_hosted) → AirGapConfig(ENT-020/021) — Air-gap and self-hosted deployment configuration.SFEnterpriseClient.assert_network_allowed()(ENT-021) — RaisesSFAirGapErrorif offline mode is active.SFEnterpriseClient.check_all_services_health() → list[HealthEndpointResult](ENT-023) — Probes/healthz+/readyzfor all 8 services (16 checks total).sf_enterprisesingleton — pre-builtSFEnterpriseClientinspanforge.sdk.__init__.
Added — spanforge.sdk.security (Phase 11)
SFSecurityClient.run_owasp_audit(...) → SecurityAuditResult(ENT-030) — Walks all 10 OWASP API Security Top 10 categories with pass/fail per category.SFSecurityClient.add_threat(service, category, threat, mitigation) → ThreatModelEntry(ENT-031) — Adds a STRIDE threat model entry.SFSecurityClient.generate_default_threat_model() → list[ThreatModelEntry](ENT-031) — Generates 10 default threats across all 8 service boundaries.SFSecurityClient.scan_dependencies(packages) → list[DependencyVulnerability](ENT-033) — pip-audit wrapper for CVE scanning.SFSecurityClient.run_static_analysis(source_files) → list[StaticAnalysisFinding](ENT-034) — bandit + semgrep wrapper for SAST.SFSecurityClient.audit_logs_for_secrets(log_lines) → int(ENT-035) — Replays log lines through 7 secret patterns (API keys, JWTs, AWS keys, GitHub tokens, OpenAI keys, Slack tokens, PEM keys).SFSecurityClient.run_full_scan(...) → SecurityScanResult— Combined dependency + static + secrets scan.sf_securitysingleton — pre-builtSFSecurityClientinspanforge.sdk.__init__.
Added — Types (Phase 11)
DataResidency,IsolationScope,TenantConfig,EncryptionConfig,AirGapConfig,HealthEndpointResult,DependencyVulnerability,StaticAnalysisFinding,ThreatModelEntry,SecurityScanResult,SecurityAuditResult,EnterpriseStatusInfo.
Added — Exceptions (Phase 11)
SFEnterpriseError(base),SFIsolationError,SFDataResidencyError,SFEncryptionError,SFFIPSError,SFAirGapError,SFSecurityScanError,SFSecretsInLogsError.
Added — CLI (Phase 11)
spanforge enterprise status|register-tenant|list-tenants|encrypt-config|healthspanforge security owasp|threat-model|scan|audit-logs
Added — Server Endpoints (Phase 11)
GET /healthz— Kubernetes liveness probe.GET /readyz— Kubernetes readiness probe (probes all 8 services).GET /v1/enterprise/status— Enterprise hardening summary.GET /v1/enterprise/health— All-services health probe.GET /v1/security/owasp— OWASP audit results.GET /v1/security/threat-model— STRIDE threat model.GET /v1/security/scan— Full security scan.
Added — Deployment (Phase 11)
docker-compose.selfhosted.yml— Self-hosted Docker Compose stack (ENT-020).helm/spanforge/— Helm chart skeleton for Kubernetes deployment (ENT-022).
2.0.9 — Unreleased
Phase 10: T.R.U.S.T. Scorecard & HallucCheck Contract
Added — spanforge.sdk.trust (Phase 10)
SFTrustClient.get_scorecard(project_id, *, from_dt, to_dt, weights) → TrustScorecardResponse(TRS-001/005) — Aggregates trust records from sf-audit and computes the five T.R.U.S.T. dimensions (Transparency · Reliability · UserTrust · Security · Traceability) with configurable weights. Overall score is a weighted average; colour bands: green ≥ 80, amber ≥ 60, red < 60.SFTrustClient.get_badge(project_id) → TrustBadgeResult(TRS-006) — Generates an SVG badge showing the T.R.U.S.T. score with colour-coded background. Returnssvg,overall,colour_band, andetag.SFTrustClient.get_history(project_id, *, from_dt, to_dt, buckets) → list[TrustHistoryEntry](TRS-005) — Returns time-series snapshots by dividing the time range into equal buckets and computing a scorecard for each.SFTrustClient.get_status() → TrustStatusInfo— Returns service health information including dimension count, total trust records, and pipelines registered.sf_trustsingleton — pre-builtSFTrustClientinstance inspanforge.sdk.__init__, configured from environment variables.
Added — spanforge.sdk.pipelines (Phase 10)
score_pipeline(text, *, model, project_id, pii_action) → PipelineResult(TRS-010) — PII scan → secrets scan → observe span → audit append. Orchestrates sf_pii, sf_secrets, sf_observe, and sf_audit in sequence.bias_pipeline(bias_report, *, project_id, disparity_threshold) → PipelineResult(TRS-011) — PII scan on segments → audit append → alert if disparity exceeds threshold → anonymise before export.monitor_pipeline(drift_event, *, project_id, alert_on_drift) → PipelineResult(TRS-012) — Observe drift span → alert if drift detected → OTel export.risk_pipeline(prri_score, *, project_id, framework, policy_file) → PipelineResult(TRS-013) — PRRI evaluation → alert if RED → gate block → CEC bundle generation.benchmark_pipeline(benchmark_results, *, project_id, model) → PipelineResult(TRS-014) — Audit append → alert if accuracy degraded → anonymise before export.
Added — CLI (Phase 10)
spanforge trust scorecard [--project-id PID]— Display the five-pillar T.R.U.S.T. scorecard as a text table.spanforge trust badge [--project-id PID]— Write the T.R.U.S.T. SVG badge to stdout.spanforge trust gate [--project-id PID]— Run the composite trust gate. Exit code 1 = overall score below threshold (red band).
Added — HTTP server routes (Phase 10)
GET /v1/trust/scorecard?project_id=…— Returns the T.R.U.S.T. scorecard as JSON.GET /v1/trust/badge/{project_id}.svg— Returns the SVG badge withimage/svg+xmlcontent type.POST /v1/trust-gate— Evaluates the composite trust gate and returns pass/fail.GET /v1/audit/{record_type}— Query audit records by record type.GET /v1/privacy/dsar/{subject_id}— DSAR data export for a subject.POST /v1/scan/secrets— Secrets scanning endpoint.
New types (Phase 10)
| Type | Module | Description |
|---|---|---|
TrustScorecardResponse | spanforge.sdk._types | Full scorecard: overall score, colour band, 5 dimensions, weights |
TrustDimension | spanforge.sdk._types | Single dimension: score, trend, last_updated |
TrustDimensionWeights | spanforge.sdk._types | Configurable weights for each pillar (default 1.0) |
TrustHistoryEntry | spanforge.sdk._types | Time-series data point: timestamp, overall, 5 dimension scores |
TrustBadgeResult | spanforge.sdk._types | SVG badge: svg, overall, colour_band, etag |
TrustStatusInfo | spanforge.sdk._types | Service health: status, dimension_count, total_trust_records |
PipelineResult | spanforge.sdk._types | Pipeline result: pipeline, success, audit_id, span_id, details |
New exceptions (Phase 10)
| Exception | Raised when |
|---|---|
SFTrustComputeError | Underlying audit store is unreachable or query fails |
SFPipelineError | A critical step within a pipeline fails |
Quality gates (Phase 10)
- 28 new tests — trust client, pipelines, CLI commands, server routes
- 5 102 total (12 skipped) — full regression pass
- Coverage:
trust.py100% line + 100% branch,pipelines.py100%/100% - ruff clean, mypy strict clean, bandit clean
2.0.8 — Unreleased
Phase 9: Integration Config & Local Fallback
Added — spanforge.sdk.config (Phase 9)
load_config_file(path?) → SFConfigBlock(CFG-001/002) — Auto-discovers and parses.halluccheck.tomlfrom the current directory, parent directories, or$SPANFORGE_CONFIG_PATH. Falls back to environment-variable defaults when no file is found. TOML parsing usestomllib(Python 3.11+) or the vendoredtomlifallback.validate_config(block) → list[str](CFG-005) — Validates aSFConfigBlockagainst the v6.0 schema. Returns a list of human-readable error strings (empty when valid). Checks key names, value types, ranges, and inter-field consistency.validate_config_strict(block) → None(CFG-006) — Likevalidate_config, but raisesSFConfigValidationErroron the first error. Intended for startup / CI gates.SFConfigBlock(CFG-003) — Typed dataclass representing the full[spanforge]configuration block:enabled,project_id,endpoint,api_key,services: SFServiceToggles,local_fallback: SFLocalFallbackConfig,pii: SFPIIConfig,secrets: SFSecretsConfig.SFServiceToggles(CFG-003) — Per-service on/off toggles for all 8 services (sf_pii,sf_secrets,sf_audit,sf_observe,sf_alert,sf_identity,sf_gate,sf_cec).SFLocalFallbackConfig(CFG-003) — Fallback settings:enabled,max_retries,timeout_ms.SFPIIConfig/SFSecretsConfig(CFG-003) — Service-specific typed configuration blocks withthresholdandauto_blocksettings.
Added — spanforge.sdk.registry (Phase 9)
ServiceRegistry.get_instance() → ServiceRegistry(CFG-010) — Thread-safe singleton holding references to all 8 service clients. Access individual clients viaregistry.get("sf_pii").ServiceRegistry.run_startup_check()(CFG-011) — Pings all enabled services and reports per-service status:up,degraded(latency > 2 s), ordown. RaisesSFStartupErrorwhen any service isdownandlocal_fallback.enabled=False.ServiceRegistry.status_response() → dict(CFG-012) — Returns a dict matching theGET /v1/spanforge/statusspecification. Each service entry includes{status, latency_ms, last_checked_at}.ServiceRegistry.start_background_checker()(CFG-013) — Launches a daemon thread that re-checks all services every 60 s. Status changes logged at WARNING; recovery (down → up) at INFO.ServiceHealth,ServiceStatus— Typed enums for health status tracking.
Added — spanforge.sdk.fallback (Phase 9)
pii_fallback(text)(CFG-020) — Local regex PII scan viaspanforge.redact. Returns entity list without remote service dependency.secrets_fallback(text)(CFG-021) — Local regex secrets scan viaspanforge.secrets. Returns scan result.audit_fallback(record, schema_key)(CFG-022) — HMAC-chained JSONL append to a local file.observe_fallback(name, attributes)(CFG-023) — OTLP JSON output to stdout.alert_fallback(topic, payload, severity)(CFG-024) — Logs alert to stderr at WARNING level.identity_fallback(token?)(CFG-025) — TrustsSPANFORGE_LOCAL_TOKENenv var for CLI/local dev use.gate_fallback(gate_id, payload)(CFG-026) — Runs gate evaluation locally viaspanforge.gate.cec_fallback(bundle_data)(CFG-027) — Writes CEC bundle to local JSONL file.
Added — CLI (Phase 9)
spanforge config validate [--file PATH](CFG-007) — Validates.halluccheck.tomlagainst the v6.0 schema. Exit codes: 0 = valid, 1 = validation errors, 2 = parse/I/O error.
New types (Phase 9)
| Type | Module | Description |
|---|---|---|
SFConfigBlock | spanforge.sdk.config | Full config representation |
SFServiceToggles | spanforge.sdk.config | Per-service enable/disable flags |
SFLocalFallbackConfig | spanforge.sdk.config | Fallback settings (enabled, retries, timeout) |
SFPIIConfig | spanforge.sdk.config | PII-specific configuration |
SFSecretsConfig | spanforge.sdk.config | Secrets-specific configuration |
ServiceRegistry | spanforge.sdk.registry | Singleton service registry |
ServiceHealth | spanforge.sdk.registry | Health status data |
ServiceStatus | spanforge.sdk.registry | Status enum (up/degraded/down) |
New exceptions (Phase 9)
| Exception | Raised when |
|---|---|
SFConfigError | .halluccheck.toml cannot be parsed or I/O error |
SFConfigValidationError | Config block fails strict validation |
SFStartupError | Service is down on startup and fallback is disabled |
SFServiceUnavailableError | Service becomes unreachable at runtime |
New environment variables (Phase 9)
| Variable | Default | Description |
|---|---|---|
SPANFORGE_ENDPOINT | "" | SpanForge API endpoint URL |
SPANFORGE_API_KEY | "" | API key for authentication |
SPANFORGE_PROJECT_ID | "default" | Project identifier |
SPANFORGE_PII_THRESHOLD | 0.8 | Minimum PII detection confidence |
SPANFORGE_SECRETS_AUTO_BLOCK | true | Auto-block high-risk secret types |
SPANFORGE_LOCAL_TOKEN | "" | Local identity token (dev/CLI mode) |
SPANFORGE_FALLBACK_TIMEOUT_MS | 5000 | Timeout before fallback activation |
Quality gates (Phase 9)
- 122 new tests — config parser, registry lifecycle, fallback correctness, CLI validation
- 5 074 total (12 skipped) — full regression pass
- Coverage:
config.py100% line + 100% branch,fallback.py100%/100%,registry.py99%/99% - ruff clean, mypy strict clean, bandit clean
2.0.7
Phase 8: CI/CD Gate Pipeline (sf-gate)
Added — spanforge.sdk.gate (Phase 8)
SFGateClient.evaluate(gate_id, payload, *, project_id) → GateEvaluationResult(GAT-004) — Evaluates a single named gate againstpayload. Applies gate logic (schema validation, secrets scan, dependency audit, performance regression, or hallucination check), writes aGateArtifactto the artifact store, and returns the structured result immediately.SFGateClient.evaluate_prri(prri_score, *, project_id, framework, policy_file, dimension_breakdown) → PRRIResult(GAT-010/011) — Evaluates a Pre-Release Readiness Index (PRRI) score against configurable thresholds. Scores ≥SPANFORGE_GATE_PRRI_RED_THRESHOLD(default 70) receiveREDverdict and block release; 30–69 =AMBER(warn); < 30 =GREEN(pass).SFGateClient.run_pipeline(gate_config_path, *, context) → GateRunResult(GAT-002) — Parses and executes a YAML gate pipeline file. Gates withon_fail: blockthat evaluate toFAILraiseSFGatePipelineError. Context variables support${var}substitution in gate commands.SFGateClient.get_artifact(gate_id) → GateArtifact | None(GAT-003) — Retrieves the most recent stored artifact for a gate. ReturnsNoneif no artifact is found.SFGateClient.list_artifacts(*, project_id) → list[GateArtifact]— Lists all stored artifacts, optionally filtered to a project. Returns most-recent-first.SFGateClient.purge_artifacts(*, older_than_days) → int— Deletes artifact files older thanolder_than_daysfrom the store. Returns the count of files removed.SFGateClient.get_status() → GateStatusInfo— Returns a live status snapshot:{status, gate_count, artifact_count, artifact_dir, retention_days, open_circuit_breakers, healthy}.SFGateClient.configure(config) → None— Overrides gate settings at runtime. Keys not present keep their current (env-var-sourced or default) values.
Gate YAML engine (Phase 8)
The GateRunner class parses YAML gate pipeline files and dispatches each gate to its executor. Supports sequential and parallel execution (parallel: true), per-gate timeouts (timeout_seconds), conditional skipping (skip_on), and three failure policies (on_fail: block | warn | report).
Built-in gate executors:
| Type | Description |
|---|---|
schema_validation | Validates payload against the SpanForge v2.0 JSON Schema. |
dependency_security | Audits package dependencies for known CVEs via the advisory database. |
secrets_scan | Runs the built-in 20-pattern secrets scanner over target files. |
performance_regression | Compares p50/p95/p99 latencies against a stored baseline. |
halluccheck_prri | Evaluates the Pre-Release Readiness Index against policy thresholds. |
halluccheck_trust | Composite trust gate: HRI critical rate + PII window + secrets window. |
New types (Phase 8)
Added to spanforge.sdk._types and re-exported from spanforge.sdk:
| Type | Description |
|---|---|
GateVerdict | Enum — PASS, FAIL, WARN, SKIPPED. |
PRRIVerdict | Enum — GREEN (< 30), AMBER (30–69), RED (≥ 70). |
GateArtifact | Immutable record written to the artifact store after each gate evaluation: {gate_id, name, verdict, metrics, timestamp, duration_ms, artifact_path}. |
GateEvaluationResult | Result of a single evaluate() call: {gate_id, verdict, metrics, artifact_url, duration_ms, timestamp}. |
PRRIResult | Result of evaluate_prri(): {gate_id, prri_score, verdict, dimension_breakdown, framework, policy_file, timestamp, allow}. |
TrustGateResult | Detailed trust gate outcome: {gate_id, verdict, hri_critical_rate, hri_critical_threshold, pii_detected, pii_detections_24h, secrets_detected, secrets_detections_24h, failures, timestamp, pipeline_id, project_id}. |
GateStatusInfo | Live status snapshot: {status, gate_count, artifact_count, artifact_dir, retention_days, open_circuit_breakers, healthy}. |
New exceptions (Phase 8)
Added to spanforge.sdk._exceptions and re-exported from spanforge.sdk:
| Exception | Base | When raised |
|---|---|---|
SFGateError | SpanForgeError | Base for all gate exceptions. Never raised directly. |
SFGateEvaluationError | SFGateError | A single gate evaluation failed (logic error, unsupported payload, or FAIL with block policy). |
SFGatePipelineError | SFGateError | Pipeline runner encountered one or more blocking failures. failed_gates: list[str] attribute. |
SFGateTrustFailedError | SFGateError | Trust gate detected a blocking condition. trust_result: TrustGateResult attribute carries full details. |
SFGateSchemaError | SFGateError | YAML gate configuration is invalid (missing field, unrecognised type, or malformed pass condition). |
Environment variables (Phase 8)
| Variable | Default | Description |
|---|---|---|
SPANFORGE_GATE_ARTIFACT_DIR | .sf-gate/artifacts | Directory for persisted GateArtifact JSON files. |
SPANFORGE_GATE_ARTIFACT_RETENTION_DAYS | 90 | Artifact retention period for purge_artifacts(). |
SPANFORGE_GATE_PRRI_RED_THRESHOLD | 70 | PRRI scores ≥ this value receive RED verdict and block release. |
SPANFORGE_GATE_HRI_CRITICAL_THRESHOLD | 0.05 | HRI critical rate threshold (0–1) for the trust gate. |
SPANFORGE_GATE_PII_WINDOW_HOURS | 24 | PII detection audit window in hours for the trust gate. |
SPANFORGE_GATE_SECRETS_WINDOW_HOURS | 24 | Secrets detection audit window in hours for the trust gate. |
Promoted from stub (Phase 8)
spanforge.sdk.sf_gate — promoted from a no-op stub to a fully operational SFGateClient instance backed by GateRunner, six gate executors, and a durable JSON artifact store.
Quality gates
- 174 new tests —
tests/test_sf_gate.pycovers all 45 acceptance criteria (GAT-001 through GAT-045): single-gate evaluation, PRRI verdicts (GREEN/AMBER/RED), trust gate blocking/passing, YAML pipeline runner, parallel execution, artifact persistence, purge, status, configure, all five exception types, and 30+ edge cases. - 4,952 total tests — all passing.
- ruff, mypy, and bandit clean.
2.0.6 — Unreleased
Phase 7: Alert Routing Service (sf-alert)
Added — spanforge.sdk.alert (Phase 7)
SFAlertClient.publish(topic, payload, *, severity, project_id) → PublishResult(ALT-001) — Publishes an alert to all configured sinks. Validates topic, checks maintenance windows, deduplicates by(topic, project_id), applies per-project rate limits, and returns aPublishResultimmediately. The first alert in a topic-prefix group is dispatched immediately; subsequent alerts within 2 minutes are coalesced.SFAlertClient.acknowledge(alert_id) → bool(ALT-020) — Cancels the escalation timer for a CRITICAL alert. ReturnsTrueif a pending timer was found and cancelled.SFAlertClient.register_topic(topic, description, default_severity, *, runbook_url, dedup_window_seconds) → None(ALT-002/003) — Registers a custom topic in the topic registry with optional runbook URL and per-topic deduplication window.SFAlertClient.set_maintenance_window(project_id, start, end) → None(ALT-030) — Suppresses all alerts for a project during the specified UTC window. Appends aspanforge.alert.maintenance.v1audit record.SFAlertClient.remove_maintenance_windows(project_id) → int— Removes all maintenance windows for a project; returns count removed.SFAlertClient.get_alert_history(*, project_id, topic, from_dt, to_dt, status, limit) → list[AlertRecord](ALT-042) — Retrieves alert history with optional filtering by project, topic, time range, and status. Returns most-recent-first, bounded bylimit(default 100).SFAlertClient.get_status() → AlertStatusInfo— Returns{status, publish_count, suppress_count, sink_count, queue_depth, pending_escalations, healthy}.SFAlertClient.add_sink(alerter, name) → None— Dynamically adds a sink at runtime.SFAlertClient.healthy: bool—Truewhen the worker thread is alive.SFAlertClient.shutdown(timeout) → None— Drains the dispatch queue, cancels all escalation timers, and stops the worker thread.- Eight built-in topics (ALT-003) — Pre-registered:
halluccheck.drift.red,halluccheck.drift.amber,halluccheck.pii.detected,halluccheck.cost.exceeded,halluccheck.latency.breach,halluccheck.audit.gap,halluccheck.security.violation,halluccheck.compliance.breach. - Deduplication (ALT-010) —
(topic, project_id)pairs suppressed fordedup_window_seconds(default: 300 s). Per-topic overrides viaregister_topic. - Alert grouping (ALT-011) — First alert in a
(topic_prefix, project_id)group dispatched immediately; subsequent alerts within 2 minutes coalesced into one notification. - CRITICAL escalation policy (ALT-020/021) — CRITICAL alerts schedule a
threading.Timer(default: 900 s). If not acknowledged, alert is re-dispatched with[ESCALATED]title prefix. - Maintenance-window suppression (ALT-030) — Per-project UTC windows; suppressed alerts are audit-logged and returned as
PublishResult(suppressed=True). - Per-project rate limiting (ALT-012) — Sliding window 60 alerts/minute; configurable. Strict mode raises
SFAlertRateLimitedError. - Audit integration — Every publish, suppression, and maintenance-window event appended to
sf_auditschemaspanforge.alert.v1(best-effort; failures logged at DEBUG). - Per-sink circuit breakers — Each sink has its own
_CircuitBreaker(5-failure threshold, 30 s reset). Failing sinks are bypassed without blocking other sinks. - History bounded at 10,000 records — Oldest entries discarded on overflow.
New sinks (Phase 7)
All sinks in spanforge.sdk.alert:
| Class | Protocol | Security |
|---|---|---|
WebhookAlerter(url, secret, timeout) | POST JSON | HMAC X-SF-Signature: sha256=<hex>, SSRF guard |
OpsGenieAlerter(api_key, region, timeout) | OpsGenie v2 Alerts API | GenieKey auth, P1–P5 priority map, US/EU URL |
VictorOpsAlerter(rest_endpoint_url, timeout) | VictorOps REST Endpoint | message_type CRITICAL/WARNING/INFO |
IncidentIOAlerter(api_key, timeout) | Incident.io v2 Alerts API | Bearer auth, critical/major/minor severity |
SMSAlerter(account_sid, auth_token, from_number, to_numbers, timeout) | Twilio Messages API | Basic auth, 160-char truncation, repr=False token |
TeamsAdaptiveCardAlerter(webhook_url, timeout) | Teams Incoming Webhook | Adaptive Card v1.3, severity colour band, FactSet, Acknowledge/Silence action buttons |
New types (Phase 7)
Added to spanforge.sdk._types and re-exported from spanforge.sdk:
| Type | Description |
|---|---|
AlertSeverity | Enum: INFO, WARNING, HIGH, CRITICAL; from_str() with fallback to WARNING |
PublishResult | {alert_id, routed_to, suppressed} |
TopicRegistration | {topic, description, default_severity, runbook_url, dedup_window_seconds} |
MaintenanceWindow | {project_id, start, end} |
AlertRecord | {alert_id, topic, severity, project_id, payload, sinks_notified, suppressed, status, timestamp} |
AlertStatusInfo | {status, publish_count, suppress_count, sink_count, queue_depth, pending_escalations, healthy} |
New exceptions (Phase 7)
Added to spanforge.sdk._exceptions and re-exported from spanforge.sdk:
| Exception | Trigger |
|---|---|
SFAlertError | Base for all alert errors |
SFAlertPublishError | All sinks circuit-open on publish |
SFAlertRateLimitedError | Per-project rate limit exceeded (strict mode) |
SFAlertQueueFullError | Dispatch queue full (> 1 000 items) |
Environment variables (Phase 7)
| Variable | Effect |
|---|---|
SPANFORGE_ALERT_SLACK_WEBHOOK | Auto-register Slack sink |
SPANFORGE_ALERT_TEAMS_WEBHOOK | Auto-register Teams Adaptive Card sink |
SPANFORGE_ALERT_PAGERDUTY_KEY | Auto-register PagerDuty sink |
SPANFORGE_ALERT_OPSGENIE_KEY | Auto-register OpsGenie sink |
SPANFORGE_ALERT_OPSGENIE_REGION | OpsGenie region (us or eu, default us) |
SPANFORGE_ALERT_VICTOROPS_URL | Auto-register VictorOps sink |
SPANFORGE_ALERT_WEBHOOK_URL | Auto-register generic webhook sink |
SPANFORGE_ALERT_WEBHOOK_SECRET | HMAC secret for generic webhook |
SPANFORGE_ALERT_DEDUP_SECONDS | Override dedup window (default 300) |
SPANFORGE_ALERT_RATE_LIMIT | Override rate limit per minute (default 60) |
SPANFORGE_ALERT_ESCALATION_WAIT | Override escalation wait seconds (default 900) |
Promoted from stub (Phase 7)
sf_alert— previously_UnimplementedClient("alert"); nowSFAlertClient(_get_config()).
2.0.5 — Unreleased
Phase 6: Observability Named SDK (sf-observe)
Added — spanforge.sdk.observe (Phase 6)
SFObserveClient.export_spans(spans, *, receiver_config=None) → ExportResult(OBS-001) — Exports a list of OTLP span dicts to the configured backend. Accepts a per-callReceiverConfigoverride to route to any OTLP-compatible collector. Returns{exported_count, failed_count, backend, exported_at}. Falls back to the local buffer on failure whenlocal_fallback_enabled=True.SFObserveClient.add_annotation(event_type, payload, *, project_id) → str(OBS-002) — Stores a structured annotation (deploy marker, threshold breach, etc.) in the in-memory annotation log. Returns the generated UUID annotation ID.SFObserveClient.get_annotations(event_type, from_dt, to_dt, *, project_id="") → list[Annotation](OBS-003) — Retrieves stored annotations filtered by event type ("*"for all), ISO-8601 time window, and optional project ID.SFObserveClient.emit_span(name, attributes) → str(OBS-004) — Constructs a fully-formed OTLP span with W3C TraceContext, OTel GenAI attributes, and error detection; applies the active sampler; emits to the backend; returns the 16-hex span ID.SFObserveClient.get_status() → ObserveStatusInfo— Returns{status, backend, sampler_strategy, span_count, annotation_count, export_count, last_export_at, healthy}.SFObserveClient.healthy: bool(OBS-043) —Trueunless the most recent export attempt raised an unrecovered error.SFObserveClient.last_export_at: str | None(OBS-043) — ISO-8601 timestamp of the last successfulexport_spanscall.make_traceparent(trace_id_hex, span_id_hex, *, sampled) → str(OBS-011) — Encodes a W3Ctraceparentheader value (00-<32>-<16>-{01|00}).extract_traceparent(traceparent) → tuple[str, str, bool](OBS-011) — Parses atraceparentheader; raisesValueErroron malformed input.- W3C Baggage injection (OBS-012) —
emit_spaninjectsproject_id,domain, andtierkeys into abaggagespan attribute when present inattributes. - OTel GenAI semantic conventions (OBS-010, OBS-014) —
gen_ai.*attribute keys are forwarded unchanged;otel.status_code,exception.messagehandled per OTel spec. - Error span detection (OBS-015) — Sets
status.code = STATUS_CODE_ERRORandotel.status_code = ERRORwhenattributes["status"] == "error"orattributes["otel.status_code"] == "ERROR". - OTel resource attributes (OBS-013) — Every span carries
service.name,service.version,service.namespace,telemetry.sdk.name,telemetry.sdk.language,telemetry.sdk.version, anddeployment.environment. - Sampling strategies (OBS-031) — Configurable via
SPANFORGE_OBSERVE_SAMPLER(always_on[default],always_off,parent_based,trace_id_ratio). Sample rate controlled bySPANFORGE_OBSERVE_SAMPLE_RATE(default1.0). - Backend routing (OBS-001, OBS-040, OBS-041) —
SPANFORGE_OBSERVE_BACKENDselects from:local(bounded deque, default),otlp(/v1/traces),datadog(/api/v0.2/traces),grafana(/api/v1/push),splunk(HEC/services/collector),elastic(ECS/_bulk). - Splunk HEC export (OBS-040) — Spans serialised as
{"event": <span>, "sourcetype": "spanforge:otel"}events. - Elastic ECS export (OBS-041) — Spans translated to Elastic Common Schema with
trace.id,transaction.id,span.name,event.outcome,@timestamp. - Local span buffer — Bounded at 10,000 spans; oldest entries discarded on overflow.
- Thread safety —
_ObserveSessionStatsprotected bythreading.Lock; annotation store uses a separate lock.
New types (Phase 6)
Added to spanforge.sdk._types and re-exported from spanforge.sdk:
| Type | Description |
|---|---|
SamplerStrategy | Enum: ALWAYS_ON, ALWAYS_OFF, PARENT_BASED, TRACE_ID_RATIO |
ReceiverConfig | {endpoint, headers, timeout_seconds} — per-call OTLP receiver config |
ExportResult | {exported_count, failed_count, backend, exported_at} |
Annotation | {annotation_id, event_type, payload, project_id, created_at} |
ObserveStatusInfo | {status, backend, sampler_strategy, span_count, annotation_count, export_count, last_export_at, healthy} |
New exceptions (Phase 6)
Added to spanforge.sdk._exceptions and re-exported from spanforge.sdk:
| Exception | Trigger |
|---|---|
SFObserveError | Base for all observe errors |
SFObserveExportError | export_spans transport or HTTP failure |
SFObserveEmitError | emit_span input validation or export failure |
SFObserveAnnotationError | add_annotation / get_annotations validation failure |
Promoted from stub (Phase 6)
sf_observe— previously_UnimplementedClient("observe"); nowSFObserveClient(_get_config()).
2.0.4 — Unreleased
Phase 5: Compliance Evidence Chain (sf-cec)
Added — spanforge.sdk.cec (Phase 5)
SFCECClient.build_bundle(project_id, date_range, frameworks=None) → BundleResult— orchestrates a full CEC bundle: exports audit records for all 6 schema keys viasf_audit, runs clause mapping for each requested framework, assembles a signed ZIP, and returns{bundle_id, zip_path, hmac_manifest, record_counts, frameworks_covered, generated_at}.- ZIP structure —
halluccheck_cec_{project}_{from}_{to}.zipcontaining:manifest.json— record inventory with HMAC-SHA256 signatureclause_map.json— per-framework clause satisfaction entries (SATISFIED / PARTIAL / GAP)chain_proof.json—verify_chain()result from sf-auditattestation.json— HMAC-signed attestation metadatarfc3161_timestamp.tsr— RFC 3161 trusted timestamp stubscore_records/,bias_reports/,prri_records/,drift_events/,pii_detections/,gate_evaluations/— NDJSON evidence per schema key
SFCECClient.verify_bundle(zip_path) → BundleVerificationResult— re-computes manifest HMAC, verifies chain proof, validates TSR presence, and returns{bundle_id, manifest_valid, chain_valid, timestamp_valid, overall_valid, errors}.SFCECClient.generate_dpa(project_id, controller_details, processor_details, *, subject_categories, transfer_mechanisms, retention_period_days, law_of_contract) → DPADocument— generates a GDPR Article 28 Data Processing Agreement. Returns{document_id, project_id, controller_details, processor_details, generated_at, content, subject_categories, transfer_mechanisms}.SFCECClient.get_status() → CECStatusInfo— returns{byos_provider, bundle_count, last_bundle_at, frameworks_supported}.- Supported regulatory frameworks —
eu_ai_act,iso_42001,nist_ai_rmf,iso27001,soc2. - Clause mapping detail per framework:
- EU AI Act — Art.9 (Risk Management), Art.10 (Data Governance), Art.12 (Record-keeping), Art.13 (Transparency), Art.14 (Human Oversight), Art.15 (Accuracy & Robustness)
- ISO/IEC 42001 — Clause 6.1 (Risk assessment), 8.3 (Impact assessment), 9.1 (Monitoring), 10 (Improvement)
- NIST AI RMF — GOVERN, MAP, MEASURE, MANAGE
- ISO/IEC 27001 Annex A — A.12.4.1, A.12.4.2, A.12.4.3
- SOC 2 — CC6, CC7, CC9
- BYOS detection — respects
SPANFORGE_AUDIT_BYOS_PROVIDERenv var;get_status()reflects provider. - HMAC signing — uses
SPANFORGE_SIGNING_KEYenv var; warns at client init if unset or using insecure default. - Thread safety —
_CECSessionStatsdataclass withthreading.Lock()protectsbundle_countandlast_bundle_at.
New types (Phase 5)
Added to spanforge.sdk._types and re-exported from spanforge.sdk:
| Type | Description |
|---|---|
ClauseSatisfaction | Enum: SATISFIED, PARTIAL, GAP |
ClauseMapEntry | {framework, clause_id, clause_name, description, status, evidence_count} |
BundleResult | Result of build_bundle() |
BundleVerificationResult | Result of verify_bundle() |
DPADocument | Result of generate_dpa() |
CECStatusInfo | Result of get_status() |
New exceptions (Phase 5)
Added to spanforge.sdk._exceptions and re-exported from spanforge.sdk:
| Exception | When raised |
|---|---|
SFCECError | Base class for all sf-cec errors |
SFCECBuildError | Bundle assembly fails (ZIP write error, HMAC failure) |
SFCECVerifyError | Bundle verification fails (file not found, HMAC mismatch, tampered manifest) |
SFCECExportError | DPA generation or export fails |
SDK singleton
sf_cec: SFCECClientsingleton registered inspanforge.sdk; loaded from_get_config()on import.configure()now recreatessf_cecalongside the other service clients.
Quality Gates (Phase 5)
- 148 Phase 5 tests passing, 0 failures.
ruff check✅ |mypy --strict✅ |bandit -r -ll✅- 4 544 total tests passing across full suite (12 skipped).
2.0.3 — Unreleased
Phase 4: Audit Service High-Level API (sf-audit)
Added — spanforge.sdk.audit (Phase 4)
SFAuditClient.append(record, schema_key, *, project_id, strict_schema) -> AuditAppendResult— validates schema key, appends HMAC-chained record to local store, writes T.R.U.S.T. feed on score schemas. Thread-safe.SFAuditClient.sign(record) -> SignedRecord— HMAC-SHA256 sign a raw dict.SFAuditClient.verify_chain(records) -> dict— re-derive and verify HMAC chain integrity; returns{valid, verified_count, tampered_count, first_tampered, gaps}.SFAuditClient.query(*, schema_key, project_id, from_dt, to_dt, limit) -> list[dict]— O(log n) SQLite-indexed date-range query with linear fallback.SFAuditClient.export(*, format, compress) -> bytes— JSONL or CSV export of full local store, with optional gzip compression.SFAuditClient.get_trust_scorecard(*, project_id, from_dt, to_dt) -> TrustScorecard— aggregates T.R.U.S.T. dimensions (hallucination, PII hygiene, secrets hygiene, gate pass rate, compliance posture).SFAuditClient.generate_article30_record(*, project_id, controller_name, ...) -> Article30Record— GDPR Article 30 Records of Processing Activity.SFAuditClient.get_status() -> AuditStatusInfo— returns backend, record count, chain length, BYOS provider, last record timestamp.- Schema key registry with 13 known keys (
halluccheck.*,spanforge.*);strict_schema=Falseallows custom keys. - BYOS routing via
SPANFORGE_AUDIT_BYOS_PROVIDERenv var (s3,azure,gcs,r2). - SQLite WAL-mode index for O(log n) date-range queries; in-memory fallback.
- New exceptions:
SFAuditError,SFAuditSchemaError,SFAuditAppendError,SFAuditQueryError. - New types:
AuditAppendResult,SignedRecord,TrustDimension,TrustScorecard,Article30Record,AuditStatusInfo. sf_auditsingleton registered inspanforge.sdkwithconfigure()support.
Phase 3: PII Service Hardening (sf-pii)
Added — spanforge.sdk.pii (Phase 3)
-
SFPIIClient.scan_text(text, *, language) -> PIITextScanResult— full-text PII scan via Presidio withredactfallback; returns{entities, redacted_text, detected}. -
SFPIIClient.anonymise(payload) -> PIIAnonymisedResult— recursively replaces PII in all string fields with<TYPE>placeholders; returns{clean_payload, redaction_manifest}. -
SFPIIClient.scan_batch(texts) -> list[PIITextScanResult]— parallel batch scan. -
SFPIIClient.apply_pipeline_action(scan_result, action, threshold) -> PIIPipelineResult— enforces"flag"/"redact"/"block"(raisesSFPIIBlockedError); filters by confidence threshold (default 0.85). -
SFPIIClient.get_status() -> PIIServiceStatus— returns{status, presidio_available, entity_types_loaded, last_scan_at}. -
SFPIIClient.erase_subject(subject_id, project_id) -> ErasureReceipt— GDPR Article 17 erasure; subject ID SHA-256 hashed in receipt. -
SFPIIClient.export_subject_data(subject_id, project_id) -> DSARExport— CCPA DSAR export aggregating all audit events for a subject. -
SFPIIClient.safe_harbor_deidentify(text) -> SafeHarborResult— HIPAA Safe Harbor de-identification of 18 PHI identifier types (45 CFR §164.514(b)(2)); dates → year, ages > 89 → "90+", ZIP → first 3 digits. -
SFPIIClient.audit_training_data(dataset_path, *, max_records) -> TrainingDataPIIReport— EU AI Act Article 10 training-data PII prevalence report. -
SFPIIClient.get_pii_stats(project_id) -> list[PIIHeatMapEntry]— per-entity-type detection stats for dashboard heat-map. -
POST /v1/scan/piiHTTP endpoint — standalone PII scan; returns{entities[], redacted_text}. -
GET /v1/spanforge/status— extended withsf_piistatus block. -
New types in
spanforge.sdk._types:PIITextScanResult,PIIAnonymisedResult,PIIRedactionManifestEntry,PIIPipelineResult,PIIServiceStatus,PIIHeatMapEntry,ErasureReceipt,DSARExport,SafeHarborResult,TrainingDataPIIReport,PIIEntityResult. -
New exceptions in
spanforge.sdk._exceptions:SFPIIBlockedError(HTTP 422, action=block),SFPIIDPDPConsentMissingError(DPDP consent missing for subject). -
China PIPL patterns in
presidio_backend.py: Chinese national ID (\d{17}[\dX]), Chinese mobile (1[3-9]\d{9}), Chinese bank card.
Quality Gates (Phase 3)
- 273 Phase 3 tests passing, 1 presidio integration skip.
- 95% line coverage on
sdk/pii.py; 92.31% repo-wide. ruff check✅ |mypy --strict✅ |bandit -r✅
Phase 1: sf-identity + sf-pii Service SDK
Added — spanforge.sdk (Phase 1: sf-identity + sf-pii)
-
SFIdentityClient(spanforge.sdk.identity) — full sf-identity API surface:issue_api_key(scopes, key_format, quota_tier, ip_allowlist)— cryptographically signed key insf_live_*/sf_test_*format (48 base62 chars).rotate_api_key(old_key)— atomic rotate with immediate old-key revocation.revoke_api_key(key)— single-use revocation; replays silently ignored.verify_api_key(key)— validates format, revocation state, IP allowlist, and rate limit.create_session(api_key)— issues HS256 JWT (RS256 when remote service configured).verify_token(token)— returnsJWTClaims; raisesSFTokenInvalidErroron tampering.introspect_token(token)— returnsTokenIntrospectionResultwith expiry and scopes.issue_magic_link(identifier, redirect_url)— 15-minute HMAC-signed single-use URL.exchange_magic_link(token)— exchanges token for a session JWT; replays raiseSFTokenInvalidError.enroll_totp(identifier)— RFC 6238 TOTP (SHA-1, 6 digits, 30 s); returnsTOTPEnrollResultwith provisioning URI and 8 single-use backup codes.verify_totp(identifier, code)— ±1 time-step drift tolerance; 5-failure lockout.verify_backup_code(identifier, code)— single-use; stored as SHA-256 hashes only.- Brute-force lockout: 5 consecutive failures → 15-minute lockout (
SFBruteForceLockedError). - IP allowlist enforcement (
SFIPDeniedError). - Per-key sliding-window rate limiting (
SFQuotaExceededError).
-
SFPIIClient(spanforge.sdk.pii) — full sf-pii API surface:scan(event)— deep regex PII scan; returnsSFPIIScanResult(hits, field paths, types).redact(event, policy)— applyRedactionPolicy; returnsSFPIIRedactResult.contains_pii(event)— boolean check; never raises.assert_redacted(event)— raisesSFPIINotRedactedErrorwith SHA-256-hashed context (never raw PII) if unredacted PII remains.anonymize(text, sensitivity)— replaces PII patterns in raw strings; returnsSFPIIAnonymizeResultwith replacement count and labels.wrap(event)— returns aRedactablewrapper for chained redaction.make_policy(min_sensitivity, redacted_by)— convenienceRedactionPolicyfactory.
-
spanforge.sdk._base— shared infrastructure:SFClientConfig— dataclass loaded from env vars (SPANFORGE_ENDPOINT,SPANFORGE_API_KEY,SPANFORGE_LOCAL_FALLBACK,SPANFORGE_TLS_VERIFY). Supportsfrom_env()andfrom_dict().SFServiceClient— abstract base with HTTP retry (3 attempts, exponential back-off), circuit breaker (5 failures → OPEN, 30 s reset), and TLS verification._CircuitBreaker— thread-safe CLOSED → OPEN → CLOSED lifecycle._SlidingWindowRateLimiter— per-key, configurable window and max calls.
-
spanforge.sdk._types— value objects:SecretStr— never exposed in__repr__/__str__/ pickle; equality viahmac.compare_digest.APIKeyBundle,JWTClaims,MagicLinkResult,TOTPEnrollResult,TokenIntrospectionResult,RateLimitInfo.SFPIIScanResult,SFPIIHit,SFPIIRedactResult,SFPIIAnonymizeResult.KeyFormat,KeyScope,QuotaTierenumerations.
-
spanforge.sdk._exceptions— full exception hierarchy:SFErrorbase →SFAuthError,SFTokenInvalidError,SFScopeError,SFIPDeniedError,SFMFARequiredError,SFBruteForceLockedError,SFQuotaExceededError,SFRateLimitError,SFServiceUnavailableError,SFStartupError,SFKeyFormatError.SFPIIError→SFPIIScanError,SFPIINotRedactedError,SFPIIPolicyError.
-
Pre-built
sf_identityandsf_piisingletons exported fromspanforge.sdk. Configuration auto-loaded from env vars on first import; callconfigure()to override.
Changed — Code Quality
ruff check src/now passes with zero errors — 60 missing public-method docstrings added acrossprocessor.py,prompt_registry.py,redact.py,sampling.py, andsigning.py;pyproject.tomlextended with justifiedignoreandper-file-ignoresentries for rule categories that are either inapplicable (lazy imports, module-state globals) or intentionally suppressed project-wide.
Upstream utility modules from sf-behaviour
Added — spanforge.http
chat_completion(endpoint, model, messages, …)— zero-dependency, synchronous OpenAI-compatible HTTP client built onurllib.request. Retries on429 / 5xxand network errors with exponential back-off (min(2**attempt, 8)s, up tomax_retriesattempts).ChatCompletionResponsefrozen dataclass:text,latency_ms,error,prompt_tokens,completion_tokens,total_tokens,ok.- Falls back to
OPENAI_API_KEYenv var when noapi_keyis supplied.
Added — spanforge.io
write_jsonl(records, path, *, mode)— write an iterable of dicts as newline-delimited JSON; creates parent directories automatically.read_jsonl(path, *, event_type, skip_errors)— read all dicts from a JSONL file with optionalevent_typefiltering and resilient error handling.append_jsonl(record, path)— single-record convenience wrapper.write_events(payloads, path, *, event_type, source, mode)— wraps each payload in a{"event_type":…, "source":…, "payload":…}envelope.read_events(path, *, event_type)— reads envelopes and returns unwrapped payloads filtered by type.
Added — spanforge.plugins
discover(group)— load all entry-point plugins registered under group. Handles the Python 3.9 / 3.10 / 3.12+entry_points()API split; silently skips broken entry points.
Added — spanforge.schema
validate(instance, schema, path)— lightweight, zero-dependency JSON Schema validator. Returns a list of error strings. Supportstype,enum,required,properties,items,minimum,maximum,minLength,maxLength.validate_strict(…)— raisesSchemaValidationErroron any error.SchemaValidationError—ValueErrorsubclass carrying anerrorslist.- Correctly distinguishes
boolfrominteger/number(Python'sisinstance(True, int)isTruebut JSON Schema treats them as separate types).
Added — spanforge.regression
RegressionDetector[T]— generic per-case pass/fail regression detector. Identifies new failures and score drops between a baseline and current eval run.RegressionReport[T]— result dataclass withnew_failures,score_drops,has_regression, andsummary().compare(…)— convenience one-shot function.- Distinct from the existing
spanforge.eval.RegressionDetector(mean-based); exposed asPassFailRegressionDetectorat the top-level package to avoid naming collision.
Added — spanforge.stats
percentile(values, p)— linear-interpolation percentile; does not mutate the input list.latency_summary(values_ms)— returns{count, mean, min, max, p50, p95, p99}rounded to 3 dp; returns zeroed output for empty input.
Added — spanforge._ansi
color(text, code, *, file)— wraps text with ANSI escape codes. Suppressed automatically whenNO_COLORis set or the target file is not a TTY.strip_ansi(text)— strips all\033[…msequences from a string (useful in tests and log processors).- Color constants:
GREEN,RED,YELLOW,CYAN,BOLD,RESET.
Added — spanforge.eval.BehaviourScorer
BehaviourScorerabstract base class — pluggable scorer for named test-case workflows. Subclasses implementscore(case, response) -> (float, reason). Distinct from the existingEvalScorerProtocol (which scores fulldictexamples).- Registered via
spanforge.scorersentry-point group for third-party scorer packages.
Added — spanforge.config.interpolate_env()
interpolate_env(data)— recursively walksstr/dict/liststructures and replaces${VAR}/${VAR:default}placeholders with environment variable values. Non-string leaves are returned unchanged. Unresolved variables with no default are left as-is.
Exposed at top-level (spanforge.*)
All new symbols are exported from the top-level spanforge package:
BehaviourScorer, ChatCompletionResponse, JsonSchemaValidationError,
PassFailRegressionDetector, RegressionReport, ansi_color,
append_jsonl, chat_completion, compare_regressions, discover_plugins,
interpolate_env, latency_summary, percentile, read_events,
read_jsonl, strip_ansi, validate_json_schema,
validate_json_schema_strict, write_events, write_jsonl.
Phase 2: sf-secrets — Secrets Scanning Engine
Added — spanforge.secrets (Phase 2)
-
SecretsScanner— standalone secrets detection engine; no network calls required.scan(text, *, confidence_threshold=0.85)→SecretsScanResultscan_batch(texts)→list[SecretsScanResult](asyncio parallel execution)- 20-pattern registry: 7 spec-defined types + 13 industry-standard additions (GitHub PAT, npm token, Slack token, Stripe key, Twilio, SendGrid, Azure SAS, SSH private key, Google API key, Terraform Cloud, HashiCorp Vault token, generic JWT, SpanForge API key)
- Shannon entropy scorer (
entropy_score(s)) — bits/char; boosts confidence for high-entropy tokens (≥ 3.5 bits/char, ≥ 32 chars) - Three-tier confidence model: pattern-only → 0.75; + entropy → 0.90; + context keyword → 0.97
- Zero-tolerance auto-block for 10 high-risk types (Bearer Token, AWS Access Key, GCP Service Account, PEM Private Key, SSH Private Key, HallucCheck API key, SpanForge API key, GitHub PAT, Stripe live key, npm token)
- Configurable allowlist suppresses known test/placeholder values
- Span deduplication: highest-confidence hit wins per overlapping region
-
SecretHit— frozen dataclass:secret_type,start,end,confidence,redacted_value([REDACTED:TYPE]replacement) -
SecretsScanResult— result dataclass:detected: bool,hits: list[SecretHit],auto_blocked: bool,redacted_text: strto_dict()— JSON-serialisable dictto_sarif()— SARIF 2.1.0 log object for GitHub Code Scanning / VS Code
-
entropy_score(s)— Shannon entropy in bits/char; importable as a standalone utility
Added — spanforge.sdk.secrets (Phase 2)
-
SFSecretsClient— SDK client with local + remote modes:scan(text)→SecretsScanResult— runsSecretsScannerlocally; if remote endpoint configured, POST/v1/scan/secretswith local fallbackscan_batch(texts)→list[SecretsScanResult]— asyncio parallel with sequential fallback- Inherits retry, circuit breaker, and TLS verification from
SFServiceClient
-
sf_secretssingleton exported fromspanforge.sdk— eager-initialised from env vars alongsidesf_identityandsf_pii -
Three new exceptions added to
spanforge.sdk._exceptions:SFSecretsError— base class for all sf-secrets errorsSFSecretsBlockedError(secret_types, count)— raised when auto-block policy fires;messageincludes detected type listSFSecretsScanError— wraps unexpected scanner failures
Added — CLI command spanforge secrets scan (Phase 2)
spanforge secrets scan <file> [--format text|json|sarif] [--redact] [--confidence FLOAT]
| Flag | Description |
|---|---|
--format | Output format: text (default), json, or sarif (SARIF 2.1.0) |
--redact | Print redacted version of the file to stdout |
--confidence | Override minimum confidence threshold (default: 0.85) |
Exit codes: 0 = clean, 1 = secrets detected, 2 = error / file not found
Added — Pre-commit hook .pre-commit-hooks.yaml (Phase 2)
- id: spanforge-secrets-scan
name: SpanForge Secrets Scan
entry: spanforge secrets scan
language: python
types: [text]
stages: [pre-commit, pre-push]
Covers Python, JavaScript/TypeScript, YAML, JSON, TOML, INI, and .env files.
Changed — spanforge.sdk (Phase 2)
sf_secrets: SFSecretsClientsingleton added alongsidesf_identityandsf_piiconfigure()updated to accept secrets-specific overrides- Baseline absent-service note for
sf-secretsis now resolved — full implementation complete
Quality gates (Phase 2)
- 120 new tests (all passing) —
tests/test_sf_secrets.py - 4 147 total tests passing, 11 skipped
- 92.28% line coverage (≥ 90% gate enforced in CI)
- Ruff clean — zero errors across all Phase 2 files
2.0.2 — 2026-04-14
Compliance Integration Hardening & CostGuard Enhancements
Added — Built-in Evaluation Scorers
FaithfulnessScorer— token-overlap scorer comparing LLM output against provided context. Returns 0–1 score; label"pass"when overlap ≥ 0.5,"skip"when context or output is missing.RefusalDetectionScorer— heuristic scorer that detects common refusal phrases (e.g. "I'm sorry", "as an AI") via case-insensitive matching. Returns 1.0 / label"refusal"on match.PIILeakageScorer— wrapsscan_payload()to flag PII in theoutputfield. Returns 1.0 / label"leak"with hit count metadata.- All three exported from
spanforgetop-level package.
Added — Eval Dataset CLI (spanforge eval)
spanforge eval save --input EVENTS.jsonl --output DATASET.jsonl— extracts evaluation examples from event payloads (output, context, reference, input, span/trace IDs) into a reusable JSONL dataset.spanforge eval run --file DATASET.jsonl [--scorers S1,S2] [--format text|json]— runs selected built-in scorers over a JSONL dataset and prints a summary. Supportsfaithfulness,refusal, andpii_leakagescorer names.
Added — Compliance Status CLI
spanforge compliance status --events-file FILE [--framework FRAMEWORK]— outputs a single JSON summary with chain integrity, PII scan results, per-clause coverage, last attestation timestamp, and events analysed count.
Added — LangSmith Migration CLI
spanforge migrate-langsmith FILE [--output FILE] [--source NAME]— reads a LangSmith export (JSONL or JSON array), converts runs to SpanForge events (llm →TRACE_SPAN_COMPLETED, tool →TOOL_CALL_COMPLETED), preserving token usage, timing, input/output, and error info. Tags withlangsmith_run_id,langsmith_trace_id, andlangsmith_parent_id.
Added — Gemini Provider Integration
spanforge.integrations.gemini— auto-instrumentation for Google Gemini (google-generativeai).patch()/unpatch()wrapsGenerativeModel.generate_contentand its async variant.normalize_response(response, *, model_name)— extracts tokens fromusage_metadata, stripsmodels/prefix, usesGenAISystem.GOOGLE.GEMINI_PRICINGtable — covers gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro, and more.- Install:
pip install spanforge[gemini]
Added — Bedrock Provider Integration
spanforge.integrations.bedrock— integration for AWS Bedrock Runtime's Converse API.normalize_converse_response(response, *, model_id)— extracts tokens fromresponse["usage"](inputTokens/outputTokens), usesGenAISystem.AWS_BEDROCK.BEDROCK_PRICINGtable — covers Claude 3 (Sonnet/Haiku/Opus), Titan (Text/Embed), Llama 3, Mistral, and Cohere on Bedrock.- Install:
pip install spanforge[bedrock]
Added — Presidio PII Backend
spanforge.presidio_backend— optional Presidio-powered PII detection backend gated behindpip install spanforge[presidio].presidio_scan_payload(payload, *, language, score_threshold)— walks payload recursively using PresidioAnalyzerEngine, maps entity types to SpanForge labels, returns standardPIIScanResult.
Changed — Security Default: scan_raw=True
contains_pii()andassert_redacted()now default toscan_raw=True, catching raw-string PII by default. Passscan_raw=Falseto restore previous behaviour.- Fixed
isinstancecheck to useMappinginstead ofdictsoscan_rawworks correctly withEvent.payload(which returnsMappingProxyType).
Changed — GenAISystem Enum
- Added
GOOGLE = "google"toGenAISystemenum innamespaces/trace.py.
Changed — pyproject.toml
- New optional dependency groups:
presidio,gemini,bedrock.
Added — India PII Pattern Pack (DPDP Act)
DPDP_PATTERNSnamed constant — ships Aadhaar and PAN number regex detectors for India's Digital Personal Data Protection Act compliance. Pass asextra_patterns=DPDP_PATTERNStoscan_payload().- Aadhaar detection — matches 12-digit numbers (XXXX XXXX XXXX, XXXX-XXXX-XXXX, or contiguous) starting with digits 2–9, validated with the Verhoeff checksum algorithm (zero false-positive on random 12-digit strings).
- PAN detection — matches the [A-Z]{5}[0-9]{4}[A-Z] format (Person, Company, Trust, etc.).
- Both types mapped to sensitivity
"high"in_SENSITIVITY_MAP. - Exported from the top-level
spanforgepackage.
Added — Extended PII Pattern Coverage
-
date_of_birthpattern — detects dates of birth across all major global formats (centuries 1900–2099):- ISO / year-first:
YYYY-MM-DD,YYYY/MM/DD,YYYY.MM.DD - US month-first:
MM/DD/YYYY,MM-DD-YYYY,MM.DD.YYYY - Day-first (UK, EU, Germany, Asia, Australia, Latin America):
DD/MM/YYYY,DD-MM-YYYY,DD.MM.YYYY - Written day-first:
15 Jan 2000,15-Jan-2000,15 January 2000 - Written month-first:
Jan 15, 2000,January 15 2000
Secondary calendar validation via
_is_valid_date()rejects impossible dates (e.g.02/30/1990,31/04/1990). Mapped to sensitivity"high". - ISO / year-first:
-
addresspattern — detects street addresses (<number> <name> <suffix>) with a curated suffix list (Street/St, Avenue/Ave, Road/Rd, Boulevard/Blvd, Drive/Dr, Lane/Ln, Court/Ct, Way, Place/Pl, Circle/Cir, Trail/Trl, Terrace/Ter, Parkway/Pkwy, Highway/Hwy, Route/Rte). Mapped to sensitivity"medium". -
_is_valid_ssn(ssn_str)— SSA range validator applied post-regex to every SSN match inscan_payload(). Rejects area000, area666, areas900–999(ITIN-reserved), group00, and serial0000, eliminating the most common false-positive ranges. -
_is_valid_date(date_str)— calendar correctness validator applied post-regex to everydate_of_birthmatch. Tries 15strptimeformat strings covering all numeric and written-month orderings; delegates todatetime.strptimefor accurate month-length and leap-year enforcement. -
Both validators follow the same pattern as existing
_luhn_check()and_verhoeff_check()— applied insidescan_payload._walk()after the regex pass.
Fixed — Compliance Attestation with Missing Signing Key
generate_evidence_package(),to_pdf(), andverify_attestation_signature()previously raisedValueErrorwhenSPANFORGE_SIGNING_KEYwas not set in the environment. They now emit alogging.WARNINGand fall back to an insecure internal default (_INSECURE_DEFAULT_KEY). Production deployments must always setSPANFORGE_SIGNING_KEY; the default key exists only for development and CI environments.
Added — Compliance Dashboard in SPA Viewer
- Clause pass/fail table — clicking the compliance chip in the
spanforge serve/spanforge uiheader opens a full compliance dashboard showing per-framework clause breakdown (clause ID, description, PASS/FAIL badge) and score percentages. - Chain integrity banner — prominent status display for chain verification: verified (green), not verified (warning), or tampered (red with count).
- Overview stat grid — total events, signed events, PII hits, events with PII, and explanation coverage percentage.
- Model registry card — lists all models observed in event payloads with invocation counts, sources, and last-seen timestamps.
- Back to Traces navigation — returns to the standard trace/event list view.
Added — Multi-Agent Cost Rollup
- Child run cost propagation —
AgentRunContextgains_child_run_costsaccumulator andrecord_child_run_cost()method.AgentRunContextManager.__exit__now automatically propagates the child run'sCostBreakdownto the parent run on thecontextvarsstack. The parentAgentRunPayload.total_costincludes both its own step costs and all nested child agent costs.
Added — Unified Provider Pricing Table
get_pricing()is now cross-provider — searches OpenAI, Anthropic, Groq, and Together AI pricing tables automatically via lazy imports. Callers (e.g._calculate_cost()) no longer need to know which provider a model belongs to.list_models()returns all providers — aggregates model names from all four pricing tables._lookup_in_table()internal helper — handles exact match, date-suffix stripping, and Together AIorg/modelkey formats.
Added — Per-Run Cost Report CLI
spanforge cost run --run-id <id> --input <file.jsonl>— new CLI subcommand that reads a JSONL events file, filtersllm.cost.*andllm.trace.agent.completedevents by run ID, and prints a formatted table with agent name, status, duration, per-model cost breakdown, and total cost. Exit code 1 when no events match; exit code 2 on file errors.
Added — Consent Boundary Monitoring in Compliance Mapping
- GDPR Art. 22 (new clause) —
consent.*andhitl.*events now map to "Automated Individual Decision-Making — consent and oversight". - GDPR Art. 25 —
consent.*events added to "Data Protection by Design" prefix list alongsidellm.redact.*.
Added — HITL Hooks in Compliance Mapping
- EU AI Act Art. 14 (new clause) —
hitl.*andconsent.*events now map to "Human Oversight — HITL review and escalation". - EU AI Act Annex IV.5 —
hitl.*events added alongsidellm.guard.*andllm.audit.*.
Added — Model Registry Attestation Enrichment
ComplianceAttestationgainsmodel_owner,model_risk_tier,model_status, andmodel_warningsfields — populated automatically fromModelRegistrywhen a registered model is found.- SOC 2 CC6.1 —
model_registry.*events added to access control clause. - NIST MAP 1.1 —
model_registry.*events added to risk mapping clause. - Warnings emitted for deprecated, retired, or unregistered models.
Added — Explainability in Compliance Mapping
- EU AI Act Art. 13 (new clause) —
explanation.*events map to "Transparency — explainability of AI decisions". - NIST MAP 1.1 —
explanation.*events added alongside trace and eval prefixes. ComplianceAttestationgainsexplanation_coverage_pctfield — percentage of decision events (llm.trace.*/hitl.*) with matchingexplanation.*events./compliance/summaryHTTP endpoint now includesexplanation_coverage_pct.
Changed
- 40 new compliance mapping tests (76 total); 19 new CostGuard gap tests; 26 new India PII + dashboard tests; full suite: 3 376 passing.
- Fixed flaky
test_sign_verify_roundtripHypothesis property test by suppressingHealthCheck.too_slow.
1.0.0 — 2026-04-13
GA Release — Production Hardening & Multi-Tenant Support
This release implements all 28 items from the SpanForge v1.0 GA Addendum. All changes are backward-compatible; no existing public API was removed.
Added — GA-01: Signing Key Security
validate_key_strength(org_secret, min_length=None) -> list[str]— checks key length (min 32 chars / 256-bit), repeated characters, well-known placeholders, and mixed character classes. Returns a list of warnings.check_key_expiry(expires_at) -> tuple[str, int]— returns(status, days)where status is"no_expiry","expired","expiring_soon", or"valid".derive_key()gainscontextparameter — appends"|" + contextto the passphrase before PBKDF2 derivation, enabling environment isolation (e.g."staging"vs"production").sign()checks key expiry — raisesSigningErrorwhen the configured key has expired.SPANFORGE_SIGNING_KEY_MIN_BITSenv var — configures minimum key length in bits (divided by 8 for character count).SPANFORGE_SIGNING_KEY_EXPIRES_ATenv var — ISO-8601 date for key expiry.SPANFORGE_SIGNING_KEY_CONTEXTenv var — context string forderive_key.configure()now callsvalidate_key_strength()whensigning_keyis set, logging warnings for weak keys.
Added — GA-02: Audit Chain Hardening
ChainVerificationResultgainstombstone_countandtombstone_event_idsfields for GDPR right-to-erasure tracking.AuditStreamlock scope narrowed — HMAC computation runs outside the lock to reduce contention under concurrent appends.
Added — GA-03: Deep PII Scanning
PIIScanHitdataclass —pii_type,path,match_count,sensitivity. Nosnippetfield (matched values are never exposed).scan_payload()gainsmax_depthparameter (default 10) to cap recursion depth._luhn_check()— Luhn algorithm validation for credit card pattern matches, reducing false positives.contains_pii()andassert_redacted()gainscan_rawkeyword — whenTrue, also runs regex-based PII scanning (not justRedactablechecks).
Added — GA-04: Multi-Tenant Key Resolution
KeyResolverprotocol —resolve(org_id) -> str.StaticKeyResolver— returns the same key for every org.EnvKeyResolver— resolves fromSPANFORGE_KEY_{ORG_ID}env vars.DictKeyResolver— resolves from an in-memory{org_id: secret}dict.verify_chain()gainskey_resolveranddefault_keyparameters — per-org key resolution for multi-tenant chains.AuditStreamgainskey_resolverandrequire_org_idparameters — per-event key resolution during append, and strict org_id enforcement.SPANFORGE_REQUIRE_ORG_IDenv var — whentrue, signing raisesSigningErrorifevent.org_idisNone.
Added — GA-05: Schema Migration (Working Implementation)
MigrationStatsdataclass —total,migrated,skipped,errors,warnings,output_path,transformed_fields.v1_to_v2()now works — no longer raisesNotImplementedError. Handles bothEventanddictinputs. Normalisesmodel→model_id, coerces tag values to strings, re-hashes md5 → sha256 checksums. Idempotent.migrate_file()— bulk JSONL migration withorg_secretre-signing,target_version, anddry_runsupport.- Internal helpers:
_rehash_md5_to_sha256(),_coerce_tag_values().
Added — GA-06: Async Audit Stream
AsyncAuditStream— asyncio-native audit chain usingasyncio.Lock. MirrorsAuditStreamAPI:await stream.append(event),await stream.rotate_key(...),await stream.verify().
Added — GA-07: Event Unknown Fields
Event._unknown_fields— preserves unrecognised fields duringfrom_dict()round-trips. Accessible viaevent.unknown_fieldsproperty. Included into_dict()output for lossless serialisation.
Added — GA-08: CLI Enhancements
spanforge scan— new--typesfilter and--fail-on-matchexit-code flag. Snippet field removed from output (matched values never exposed).spanforge migrate— new--target-version,--sign,--dry-runflags for bulk JSONL migration.spanforge check-health— new--output jsonflag, PII scan step, egress configuration check, and exit code 1 on any failure.spanforge rotate-key— defaults output to.rotated.jsonl, re-verifies chain after rotation.
Changed
spanforge.__version__is"1.0.0".- Minimum signing key length raised from 0 to 32 characters (256-bit).
_server.pycompliance summary includes chain verification and PII data._server.pyevents endpoint supports prefix matching,hmac_validfilter, and pagination with 30-second poll interval.
Test Suite
- 3162 tests passing, 10 skipped, 91.74% line coverage.
- 7 new conformance tests (C011–C017).
- 28 new migration tests, ~30 new signing tests, ~12 new config tests.
- Concurrent
AuditStreambenchmark test.
1.0.7 — 2026-03-09
Instrumentation Engine — Seven Tools Complete
This release delivers the complete instrumentation engine planned in
spanforge-IMPL-PLAN.md. All seven tools are implemented, tested, and
fully exported from the top-level spanforge namespace. All changes are
backward-compatible; no existing public API was removed.
Added
-
Tool 1 —
@trace()decorator (spanforge.trace,spanforge.export.otlp_bridge)@trace(name, span_kind, attributes)— wraps sync and async functions, auto-emitsllm.trace.spanstart/end events with timing and error capture.SpanOTLPBridge,span_to_otlp_dict()— converts spanforge span events to OpenTelemetry proto-compatible dicts for OTLP/gRPC export.
-
Tool 2 — Cost Calculation Engine (
spanforge.cost)CostTracker— tracks cumulative token costs per model across a session.BudgetMonitor— per-session USD budget with threshold alerts.@budget_alert(limit_usd, on_exceed)— fires a callback when the session budget is exceeded.emit_cost_event(),emit_cost_attributed()— emitllm.cost.*events.cost_summary()— aggregate totals over a list ofCostRecordobjects.CostRecord— immutable dataclass capturing model, tokens, and USD cost.
-
Tool 3 — Tool Call Inspector (
spanforge.inspect)InspectorSession— context manager that intercepts tool calls within a trace and records their arguments, results, latency, and errors.inspect_trace(trace_id)— returns a list ofToolCallRecordobjects for a completed trace.ToolCallRecord— dataclass withtool_name,arguments,result,duration_ms,error, andspan_idfields.
-
Tool 4 — Tool Schema Builder (
spanforge.toolsmith)@tool(name, description, tags)— registers a function as a typed tool in the default registry; infers parameters from type annotations.ToolRegistry— manages a collection ofToolSchemaobjects; supportsregister(),get(),list_tools(), andunregister().build_openai_schema(tool)— renders aToolSchemaas an OpenAI function-calling JSON object.build_anthropic_schema(tool)— renders aToolSchemaas an Anthropic tool-use JSON object.ToolSchema,ToolParameter,ToolValidationError,default_registry.
-
Tool 5 — Retry and Fallback Engine (
spanforge.retry)@retry(max_attempts, backoff, exceptions, on_retry)— retries a sync/async callable with exponential back-off; emits retry events.FallbackChain(*providers)— tries providers in order; falls back on any exception.CircuitBreaker(failure_threshold, recovery_timeout)— open/close/ half-open state machine; raisesCircuitOpenErrorwhen open.CostAwareRouter(providers)— routes each call to the cheapest available provider given currentCostTrackerstate.AllProvidersFailedError,CircuitOpenError,CircuitState.
-
Tool 6 — Semantic Cache Engine (
spanforge.cache)SemanticCache(backend, similarity_threshold, ttl_seconds, namespace, embedder, max_size, emit_events)— prompt deduplication via cosine similarity; pluggable backends.@cached(threshold, ttl, namespace, backend, tags, emit_events)— decorator for sync and async functions; supports bare@cachedand@cached(...)forms.InMemoryBackend(max_size)— LRU in-process store, thread-safe.SQLiteBackend(db_path)— persistent store using stdlibsqlite3.RedisBackend(host, port, db, prefix)— distributed store; requires the optionalredispackage.- Emits
llm.cache.hit,llm.cache.miss,llm.cache.written,llm.cache.evictedevents whenemit_events=True. CacheBackendError,CacheEntry.
-
Tool 7 — SDK Instrumentation Linter (
spanforge.lint)run_checks(source, filename) -> list[LintError]— parses Python source withastand runs all AO-code checks.LintError(code, message, filename, line, col)— dataclass returned by every check.- AO001 —
Event()missing one ofevent_type,source, orpayload. - AO002 — bare
strliteral passed toactor_id,session_id, oruser_id(should useRedactable()). - AO003 —
event_type=string literal not present in registeredEventTypevalues. - AO004 — LLM provider API call (
.chat.completions.create()etc.) outside awith tracer.span()/agent_run()context. - AO005 —
emit_span/emit_agent_*called outsideagent_run()/agent_step()context. - flake8 plugin — registered as
AO = "spanforge.lint._flake8:SpanForgeChecker"via[project.entry-points."flake8.extension"]; all five codes surfaced natively in flake8 / ruff output. - CLI —
python -m spanforge.lint [FILES_OR_DIRS...]; exits0(clean) or1(errors found).
Test suite
- 3 032 tests passing, 42 skipped, ≥ 92.84 % line and branch coverage.
1.0.6 — 2026-03-07
Architect Review — Developer Experience & Reliability Improvements
All changes are backward-compatible; no existing public API was removed.
Added
spanforge/testing.py— first-class test utilities:MockExporter,capture_events()context manager,assert_event_schema_valid(), andtrace_store()isolated store context manager. Write unit tests for your AI pipeline without real exporters.spanforge/auto.py\u2014 integration auto-discovery. Callspanforge.auto.setup()to auto-patch every installed LLM integration (OpenAI, Anthropic, Ollama, Groq, Together AI).setup()must be called explicitly \u2014import spanforge.autoalone does not patch anything.spanforge.auto.teardown()cleanly unpatches all.- Async hooks (
spanforge._hooks) —AsyncHookFntype alias and four new async registration methods onHookRegistry:on_agent_start_async(),on_agent_end_async(),on_llm_call_async(),on_tool_call_async(). Async hooks are fired viaasyncio.ensure_future()on the running loop; silently skipped when no loop is running. spanforge checkCLI — newspanforge checksub-command performs a five-step end-to-end health check (config → event creation → schema validation → export pipeline → trace store) and exits 0/1.trace_store()context manager (spanforge.trace_store) — installs a fresh, isolatedTraceStorefor the duration of awithblock and restores the previous singleton on exit. Exported at package level.- Export retry with back-off (
spanforge._stream) — the dispatch pipeline now retries failed exports up toexport_max_retriestimes (default: 3) with exponential back-off (0.5 s, 1 s, 2 s …). Configurable viaspanforge.configure(export_max_retries=N). - Structured export logging —
logging.getLogger("spanforge.export")now emitsWARNING-level messages on every export error andDEBUG-level messages on each retry attempt. - Export error counter —
spanforge._stream.get_export_error_count()returns the cumulative count of export errors since process start; useful for health-check endpoints. unpatch()/is_patched()for all three callback-based integrations (crewai,langchain,llamaindex) — consistent unpatch API across every integration module.NotImplementedWarning(spanforge.migrate) —v1_to_v2()now emits aNotImplementedWarningviawarnings.warn()before raisingNotImplementedErrorso tools that filter warnings still see the signal.v1_to_v2is removed fromspanforge.__all__.assert_no_sunset_reached()(spanforge.assert_no_sunset_reached) — CI helper that raisesAssertionErrorlisting anySunsetPolicyrecords whosesunsetversion is ≤ the current SDK version.- Frozen payload dataclasses —
SpanPayload,AgentStepPayload, andAgentRunPayloadare now@dataclass(frozen=True); attempts to mutate a completed span record now raiseFrozenInstanceErrorimmediately. - Custom exporter tutorial — new doc at
docs/user_guide/custom_exporters.mdcovering theSyncExporterprotocol, HTTP + batching examples, error handling, and test patterns.
Changed
spanforge.__version__bumped from"1.0.5"to"1.0.6".HookRegistry.__repr__now includes both sync and async hook counts.spanforge.__all__updated: addedAsyncHookFn,assert_no_sunset_reached,NotImplementedWarning,trace_store,testing,auto; removedv1_to_v2.
2.0.0 (previous) — 2026-03-07
Phases 1–5 — Core Foundation, Compliance Infrastructure, Developer Experience, Production Analytics, Ecosystem Expansion
This release is a comprehensive upgrade of the SDK runtime. All changes are backward-compatible unless noted; no existing public API was removed.
Added — Phase 1: Core Foundation
contextvars-based context propagation — the three internal stacks (_span_stack_var,_run_stack_var) are nowcontextvars.ContextVartuples instead ofthreading.locallists. Context flows correctly acrossasynciotasks,loop.run_in_executorthread pools, andconcurrent.futuresworkers. Sync code is unaffected.copy_context()(spanforge.copy_context) — returns a shallow copy of the currentcontextvars.Contextfor manually spawned threads or executor tasks. Re-exported at the top-levelspanforgepackage.- Async context-manager support —
SpanContextManager,AgentRunContextManager, andAgentStepContextManagernow implement__aenter__/__aexit__soasync with tracer.span(...),async with tracer.agent_run(...), andasync with tracer.agent_step(...)all work without any API change. Traceclass (spanforge.Trace) — a first-class object returned bystart_trace()that holds a reference to the root span and accumulates all child spans. Convenience methods:llm_call(),tool_call(),end(),to_json(),save(),print_tree(),summary(). Supportswith start_trace(...) as trace:andasync with start_trace(...) as trace:.start_trace(agent_name, **attributes)(spanforge.start_trace) — opens a new trace, pushes a rootAgentRunContextManageronto the context stack, and returns aTraceobject that acts as the root context for all child spans. Re-exported at the top-levelspanforgepackage.
Added — Phase 2: Compliance Infrastructure
SpanEventdataclass (spanforge.namespaces.trace.SpanEvent) — a named, timestamped event (nanosecond resolution) with an open-endedmetadata: dictfield. Participates into_dict()/from_dict()round-trips.Span.add_event(name, metadata=None)— append aSpanEventto the active span at any point during its lifetime.SpanErrorCategorytype alias (spanforge.types.SpanErrorCategory) — typedLiteralfor"agent_error","llm_error","tool_error","timeout_error","unknown_error". Built-in exception types (TimeoutError,asyncio.TimeoutError) are auto-mapped to"timeout_error"bySpan.record_error().Span.record_error(exc, category=...)— enhanced to accept an optionalcategory: SpanErrorCategory; storeserror_categoryon the span and inSpanPayload.error_category.Span.set_timeout_deadline(seconds)— schedules a background timer that setsstatus = "timeout"anderror_category = "timeout_error"if the span is not closed within the deadline.- LLM span schema extensions —
SpanPayloadgains three optional fields:temperature: float | None,top_p: float | None,max_tokens: int | None. All existing calls that do not set these fields are unaffected. - Tool span schema extensions —
ToolCallgains:arguments_raw: str | None— raw tool arguments (populated only whenSpanForgeConfig.include_raw_tool_io = True; redaction policy is applied before storage).result_raw: str | None— raw tool result (same opt-in flag).retry_count: int | None— zero-based retry counter.external_api: str | None— identifier for the external service called.
SpanForgeConfig.include_raw_tool_io(bool, defaultFalse) — opt-in flag that controls whetherarguments_raw/result_raware stored. When aRedactionPolicyis configured, raw values are passed throughredact.redact_value()before storage.
Added — Phase 3: Developer Experience
spanforge.debugmodule — standalone debug utilities (also available as methods onTrace):print_tree(spans, *, file=None)— pretty-prints a hierarchical span tree with Unicode box-drawing characters, duration, token counts, and costs. Respects theNO_COLORenvironment variable.summary(spans) -> dict— returns an aggregated statistics dictionary:trace_id,agent_name,total_duration_ms,span_count,llm_calls,tool_calls,total_input_tokens,total_output_tokens,total_cost_usd,errors.visualize(spans, output="html", *, path=None) -> str— generates a self-contained HTML Gantt-timeline string (no external dependencies). Passpath="trace.html"to write directly to a file.
print_tree,summary,visualizere-exported from the top-levelspanforgepackage.- Sampling controls added to
SpanForgeConfig:sample_rate: float = 1.0— fraction of traces to emit (0.0–1.0). Decision is made pertrace_id(deterministic SHA-256 hash) so all spans of a trace are always sampled together.always_sample_errors: bool = True— spans/traces withstatus = "error"or"timeout"are always emitted regardless ofsample_rate.trace_filters: list[Callable[[Event], bool]]— custom per-event predicates evaluated after the probabilistic gate.
SPANFORGE_SAMPLE_RATEenvironment variable — overridessample_rateat startup.
Added — Phase 4: Production Analytics
spanforge.metricsmodule:aggregate(events) -> MetricsSummary— single-call aggregation over anyIterable[Event](file, in-memory list, orTraceStore).MetricsSummarydataclass —trace_count,span_count,agent_success_rate,avg_trace_duration_ms,p50_trace_duration_ms,p95_trace_duration_ms,total_input_tokens,total_output_tokens,total_cost_usd,llm_latency_ms(LatencyStats),tool_failure_rate,token_usage_by_model,cost_by_model.agent_success_rate(events),llm_latency(events),tool_failure_rate(events),token_usage(events)— focused single-metric helpers.- Re-exported as
import spanforge; spanforge.metrics.aggregate(events).
spanforge._store.TraceStore— in-memory ring buffer (bounded toSpanForgeConfig.trace_store_size, default 100) that retains the last N traces for programmatic access:get_trace(trace_id)→list[Event] | Noneget_last_agent_run()→list[Event] | Nonelist_tool_calls(trace_id)→list[SpanPayload]list_llm_calls(trace_id)→list[SpanPayload]clear()
- Module-level convenience functions re-exported from
spanforge:get_trace(),get_last_agent_run(),list_tool_calls(),list_llm_calls(). SpanForgeConfig.enable_trace_store(bool, defaultFalse) — enables theTraceStorering buffer. When aRedactionPolicyis configured, events are redacted before storage.SpanForgeConfig.trace_store_size(int, default100) — maximum number of traces retained in the ring buffer.SPANFORGE_ENABLE_TRACE_STORE=1environment variable override.
Added — Phase 5: Ecosystem Expansion
spanforge._hooks.HookRegistry— callback registry for global span lifecycle hooks with decorator API:@hooks.on_agent_start/@hooks.on_agent_end@hooks.on_llm_call@hooks.on_tool_callhooks.clear()— unregister all hooks (useful in tests)- Thread-safe via
threading.RLock.
spanforge.hooks— module-level singletonHookRegistry. Re-exported from the top-levelspanforgepackage.@spanforge.hooks.on_llm_call def my_hook(span): print(f"LLM called: {span.model}")spanforge.integrations.crewai— CrewAI event handler:SpanForgeCrewAIHandler— callback handler that emitsllm.trace.*events for agent actions, task lifecycle, and tool calls. Follows the same pattern asLLMSchemaCallbackHandler.patch()— convenience function that registers the handler into CrewAI globally (guards withimportlib.util.find_spec("crewai")so the module is safely importable without CrewAI installed).
Changed
spanforge.__version__:1.0.6→2.0.0
1.0.6 — 2026-03-07
Phase 6 — OpenAI Auto-Instrumentation
Added
spanforge.integrations.openai— zero-boilerplate OpenAI tracing. Callingpatch()monkey-patches bothopenai.resources.chat.completions.Completions.create(sync) andAsyncCompletions.create(async) so every chat completion automatically populates the activespanforgespan with token usage, model info, and a computed cost breakdown.patch()/unpatch()— idempotent lifecycle; safe to call multiple times;unpatch()fully restores original methods.is_patched()— returnsTrueafterpatch(),Falseif OpenAI is not installed orunpatch()has been called.normalize_response(response) -> (TokenUsage, ModelInfo, CostBreakdown)— extracts all available token counts (input, output, total, cached, reasoning) and computes USD cost from the static pricing table._auto_populate_span(response)— updates the active span if one is present; silently skips if no span is active or if the span already hastoken_usageset; swallows all instrumentation errors so they never surface in user code.
spanforge.integrations._pricing— static OpenAI pricing table (USD / 1 M tokens) covering GPT-4o, GPT-4o-mini, GPT-4 Turbo, GPT-4, GPT-3.5 Turbo, o1, o1-mini, o1-preview, o3-mini, o3, and the text-embedding-3-* / ada-002 families. Prices reflect OpenAI's published rates as of2026-03-04.get_pricing(model)— exact lookup with automatic date-suffix stripping fallback (e.g."gpt-4o-2024-11-20"→"gpt-4o").list_models()— sorted list of all known model names.PRICING_DATE = "2026-03-04"— snapshot date attached to everyCostBreakdownfor auditability.
- 68 new tests in
tests/test_phase6_openai_integration.pycovering pricing table correctness,normalize_responsefield mapping, all_compute_costbranches (cached discount, o1/o3 reasoning rate, non-negative clamp, pricing-date attachment),_auto_populate_span(including theexcept Exception: passinstrumentation-error-swallow branch), patch lifecycle, async wrapper, and end-to-end tracer integration.
Fixed
openai.py—_PATCH_FLAGconsistency:patch()andunpatch()now usesetattr/delattrwith the_PATCH_FLAGconstant instead of hardcoding the string"_spanforge_patched", eliminating a silent mismatch risk if the constant is ever renamed.openai.pydocstring: usage example corrected fromspanforge.span()tospanforge.tracer.span().
Coverage
spanforge/integrations/openai.py: 100 % (was 99 %)spanforge/integrations/_pricing.py: 100 %- Total suite: 2 407 tests, 97.00 % coverage
1.0.5 — 2026-03-06
Version bump
- Bumped version to 1.0.5 across
pyproject.toml,spanforge/__init__.py, docs, and tests. - Completed full rename from
traciumtospanforgeacross the entire codebase.
1.0.4 — 2026-03-05
Version bump
- Bumped version to 1.0.4 across
pyproject.toml,spanforge/__init__.py, docs, and tests.
1.0.3 — 2026-03-05
Version bump
- Updated version references in
docs/index.mdanddocs/changelog.mdto matchpyproject.toml.
1.0.2 — 2026-03-04
Packaging fix
- Added PyPI badge (links to
https://pypi.org/project/spanforge/) to README, docs index, and installation page. - Fixed remaining relative spanforge Standard link in
docs/index.md.
1.0.1 — 2026-03-04
Packaging fix
- Fixed broken spanforge Standard link on PyPI project page — now points to
https://www.getspanforge.com/standard.
1.0.0 — 2026-03-04
Phase 10 — CLI Tooling
spanforge validate EVENTS_JSONL— schema-validates every event in a JSONL file; prints per-line errors.spanforge audit-chain EVENTS_JSONL— verifies HMAC signing-chain integrity; readsspanforge_SIGNING_KEYfrom the environment.spanforge inspect EVENT_ID EVENTS_JSONL— pretty-prints a single event looked up byevent_id.spanforge stats EVENTS_JSONL— prints a summary of event counts, token totals, estimated cost, and timestamp range.
Phase 11 — Security & Privacy Pipeline
- Auto-redaction via
configure()— passingredaction_policy=toconfigure()wiresRedactionPolicy.apply()into the_dispatch()path; every emitted span/event is redacted before being handed to the exporter. - Auto-signing via
configure()— passingsigning_key=toconfigure()wires HMAC-SHA256 signing into the dispatch path; every event is signed and chained to the previous one automatically. - Pipeline order guaranteed — redaction always runs before signing, so each signature covers the already-redacted payload.
_reset_exporter()closes file handles — calling_reset_exporter()now flushes and closes any openSyncJSONLExporterfile handle and clears the HMAC chain state, preventingResourceWarningin tests and on shutdown.examples/— four runnable sample scripts:openai_chat.py,agent_workflow.py,langchain_chain.py,secure_pipeline.py.- Version:
0.2.0→1.0.0; coverage threshold:99 %→90 %.
0.1.0 — 2026-03-04
Changed
- Package renamed from
llm-toolkit-schematospanforge— PyPI distribution isspanforge(pip install spanforge), import name isspanforge. The old package name is a deprecated shim that re-exports fromspanforgeand emits aDeprecationWarning. - Schema version bumped to
2.0(SpanForge AI Compliance Standard RFC-0001 v2.0). - 36 canonical
EventTypevalues registered (RFC-0001 Appendix B). - 11 namespace payload modules ship 42 v2.0 dataclasses under
spanforge.namespaces.*. TokenUsagefields renamed:prompt_tokens→input_tokens,completion_tokens→output_tokens,total→total_tokens.ModelInfofield change:provider(plain string) replaced bysystem(GenAISystemenum, OTelgen_ai.systemaligned).SpanPayloadreplacesSpanCompletedPayload/TracePayload. New sibling payloads:AgentStepPayload,AgentRunPayload.CacheHitPayloadreplacesCachePayload;CostTokenRecordedPayloadreplacesCostPayload;EvalScoreRecordedPayloadreplacesEvalPayload;FenceValidatedPayloadreplacesFencePayload;PromptRenderedPayloadreplacesPromptPayload;RedactPiiDetectedPayloadreplacesRedactPayload;TemplateRegisteredPayloadreplacesTemplatePayload;DiffComputedPayloadreplacesDiffPayload.spanforge.namespaces.audit— new module:AuditKeyRotatedPayload,AuditChainVerifiedPayload,AuditChainTamperedPayload.
1.0.0-rc.3 — 2026-03-15
Added
OTelBridgeExporter(spanforge.export.otel_bridge) — exports events through any configured OpenTelemetryTracerProvider. Requires the[otel]extra (opentelemetry-sdk>=1.24). UnlikeOTLPExporter, this bridge uses the SDK's span lifecycle so all registeredSpanProcessorinstances (sampling, batching, auto-instrumentation hooks) fire normally.make_traceparent(trace_id, span_id, *, sampled=True)(spanforge.export.otlp) — constructs a W3C TraceContexttraceparentheader string (RFC 9429).extract_trace_context(headers)(spanforge.export.otlp) — parsestraceparent/tracestateheaders and returns a dict of{trace_id, span_id, sampled[, tracestate]}.gen_ai.*semantic convention attributes (GenAI semconv 1.27+) —to_otlp_span()now emitsgen_ai.system,gen_ai.request.model,gen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.operation.name, andgen_ai.response.finish_reasonsfrom the correspondingpayload.*fields, enabling native LLM dashboards in Grafana, Honeycomb, and Dynatrace.
Fixed
deployment.environment.name—ResourceAttributes.to_otlp()now emits the semconv 1.21+ keydeployment.environment.nameinstead of the legacydeployment.environment.spanKind—to_otlp_span()now setskind: 3(CLIENT) as required by the OTLP specification.traceFlags—to_otlp_span()now setstraceFlags: 1(sampled) on every span context.endTimeUnixNano— computed correctly asstartTimeUnixNano + payload.duration_ms × 1 000 000; previously omitted.status.code/status.message—payload.statusvalues"error"and"timeout"now map to OTLPSTATUS_CODE_ERROR(2);"ok"maps toSTATUS_CODE_OK(1). Previously the status block was always empty.
1.0.0-rc.2 — 2026-03-15
Fixed
Event.payloadnow returns a read-onlyMappingProxyType— mutating the returned object no longer silently corrupts event state.EventGovernancePolicy(strict_unknown=True)now correctly raisesGovernanceViolationErrorfor unregistered event types (was a no-op previously); docstring corrected to match actual behaviour._cli.py— broadexcept Exceptionreplaced with typed(DeserializationError, SchemaValidationError, KeyError, TypeError), preventing silent swallowing of unexpected errors.stream.py— broadexcept ExceptioninEventStream.from_fileandEventStream.from_kafkareplaced with(LLMSchemaError, ValueError).validate.py— checksum regex tightened to^sha256:[0-9a-f]{64}$and signature regex to^hmac-sha256:[0-9a-f]{64}$, aligning with the prefixes actually produced bysigning.py(bare 64-hex patterns accepted invalid values).export/datadog.py:- Fallback span/trace IDs are now deterministic SHA-256 derivations of the
event ID instead of Python
hash()(non-reproducible across processes). - Span start timestamp uses
event.timestamprather than wall-clock time. dd_siteis validated as a hostname (no scheme/path).agent_urlis validated as anhttp://orhttps://URL.
- Fallback span/trace IDs are now deterministic SHA-256 derivations of the
event ID instead of Python
export/otlp.py—export_batchnow chunks the event list bybatch_sizeand issues one request per chunk; previously the parameter was accepted but never applied. URL scheme validated on construction.export/webhook.py— URL scheme validated on construction (http://orhttps://only).export/grafana.py— URL scheme validated on construction.redact.py—_has_redactable/_count_redactableuse thecollections.abc.MappingABC instead ofdict, so payloads built fromMappingProxyTypeor other mapping types are handled correctly.
Added
GuardPolicy(spanforge.namespaces.guard) — runtime input/output guardrail enforcement with configurable fail-open / fail-closed mode and callable checker injection.FencePolicy(spanforge.namespaces.fence) — structured-output validation driver with retry-sequence loop andmax_retrieslimit.TemplatePolicy(spanforge.namespaces.template) — variable presence checking and output validation for prompt-template workflows.iter_file(path)(spanforge.stream) — synchronous generator that streams events from an NDJSON file without buffering the entire file.aiter_file(path)(spanforge.stream) — async-generator equivalent ofiter_file.
1.0.0-rc.1 — 2026-03-01
Added
Phase 7 — Enterprise Export Backends
DatadogExporter(spanforge.export.datadog) — async exporter that sends events as Datadog APM trace spans (via the local Agent) and as Datadog metrics series (via the public API). Noddtracedependency.DatadogResourceAttributes— frozen dataclass withservice,env,version, andextrafields;.to_tags()for tag-string serialisation.GrafanaLokiExporter(spanforge.export.grafana) — async exporter that pushes events to Grafana Loki via the/loki/api/v1/pushHTTP endpoint. Supports multi-tenant deployments viaX-Scope-OrgID.ConsumerRegistry/ConsumerRecord(spanforge.consumer) — thread-safe registry for declaring schema-namespace dependencies at startup.assert_compatible()raisesIncompatibleSchemaErroron version mismatches.EventGovernancePolicy(spanforge.governance) — data-class policy with blocked types, deprecated-type warnings, and arbitrary custom rule callbacks. Module-levelset_global_policy()/check_event().GovernanceViolationError,GovernanceWarning— governance exception and warning types.
Phase 8 — Ecosystem Integrations & Kafka
EventStream.from_kafka()— classmethod constructor that drains a Kafka topic into anEventStream. Requires optional extrakafka.DeprecationRegistry/DeprecationNotice(spanforge.deprecations) — structured per-event-type deprecation tracking withwarn_if_deprecated()andlist_deprecated().LLMSchemaCallbackHandler(spanforge.integrations.langchain) — LangChainBaseCallbackHandlerthat emitsllm.trace.*events for all LLM and tool invocations. Requires optional extralangchain.LLMSchemaEventHandler(spanforge.integrations.llamaindex) — LlamaIndex callback event handler. Requires optional extrallamaindex.
Phase 9 — v2 Migration Framework
SunsetPolicy(spanforge.migrate) —Enumclassifying removal urgency:NEXT_MAJOR,NEXT_MINOR,LONG_TERM,UNSCHEDULED.DeprecationRecord(spanforge.migrate) — frozen dataclass capturingevent_type,since,sunset,sunset_policy,replacement,migration_notes, andfield_renamesfor structured migration guidance.v2_migration_roadmap()— returns all 9 deprecation records for event types that will change in v2.0, sorted byevent_type.- CLI:
list-deprecated— prints all deprecation notices from the global registry. - CLI:
migration-roadmap [--json]— prints the v2 migration roadmap in human-readable or JSON form. - CLI:
check-consumers— lists all registered consumers and their compatibility status against the installed schema version.
Changed
- Version:
1.0.1→1.0.0-rc.1 export/__init__.pynow re-exportsDatadogExporter,DatadogResourceAttributes, andGrafanaLokiExporter.- Top-level
spanforgepackage re-exports all Phase 7/8/9 public symbols.
Optional extras added
| Extra | Enables |
|---|---|
kafka | EventStream.from_kafka() via kafka-python>=2.0 |
langchain | LLMSchemaCallbackHandler via langchain-core>=0.2 |
llamaindex | LLMSchemaEventHandler via llama-index-core>=0.10 |
datadog | DatadogExporter (stdlib-only transport; extra reserved for future ddtrace integration) |
all | All optional extras in one install target |
1.0.1 — 2026-03-01
Changed
- Python package renamed from
llm_schematospanforge. The import path is nowimport spanforge(orfrom spanforge import ...). The distribution namespanforgeand all runtime behaviour are unchanged. This is the canonical, permanently stable import name. - Version:
1.0.0→1.0.1
1.0.0 — 2026-03-01
General Availability release. The public API is now stable and covered by semantic versioning guarantees.
Added
- Compliance package (
spanforge.compliance) — programmatic v1.0 compatibility checklist (CHK-1 through CHK-5), multi-tenant isolation verification, and audit chain integrity suite. All checks are callable without a pytest dependency. test_compatibility()— applies the five-point adoption checklist to any sequence of events. Powers the newspanforge check-compatCLI command.verify_tenant_isolation()/verify_events_scoped()— detect cross-tenant data leakage in multi-org deployments.verify_chain_integrity()— wrapsverify_chain()with gap, tamper, and timestamp-monotonicity diagnostics.spanforge check-compatCLI sub-command — reads a JSON file of serialised events and prints compatibility violations.spanforge.migrate—MigrationResultdataclass andv1_to_v2()scaffold (raisesNotImplementedError; full implementation ships in Phase 9).- Performance benchmark test suite (
tests/test_benchmarks.py,@pytest.mark.perf) validating all NFR targets.
Changed
- Version:
0.5.0→1.0.0 - PyPI classifier:
Development Status :: 3 - Alpha→Development Status :: 5 - Production/Stable
0.5.0 — 2026-02-22
Added
- Namespace payload dataclasses for all 10 reserved namespaces
(
llm.trace.*,llm.cost.*,llm.cache.*,llm.diff.*,llm.eval.*,llm.fence.*,llm.guard.*,llm.prompt.*,llm.redact.*,llm.template.*). Thellm.tracepayload is FROZEN at v1 — no breaking changes permitted. schemas/v1.0/schema.json— published JSON Schema for the event envelope.validate_event()— validates an event against the JSON Schema with an optionaljsonschemabackend; falls back to structural stdlib checks.
0.4.0 — 2026-02-15
Added
OTLPExporter— async OTLP/HTTP JSON exporter with retry, gzip compression, and configurable resource attributes.WebhookExporter— async HTTP webhook exporter with configurable headers, retry backoff, and timeout.JSONLExporter— synchronous JSONL file exporter with optional per-event gzip compression.EventStream— in-process event router with type filters, org/team scoping, sampling, and fan-out to multiple exporters.
0.3.0 — 2026-02-08
Added
sign()/verify()— HMAC-SHA256 event signing and verification (sha256:payload checksum +hmac-sha256:chain signature).verify_chain()— batch chain verification with gap detection and tampered-event identification.AuditStream— sequential event stream that signs and links every appended event viaprev_id.- Key rotation —
AuditStream.rotate_key()emits a signed rotation event and switches the active HMAC key. assert_verified()— strict raising variant ofverify().
0.2.0 — 2026-02-01
Added
- PII redaction framework —
Redactable,Sensitivity,RedactionPolicy,RedactionResult,contains_pii(),assert_redacted(). - Pydantic v2 model layer —
spanforge.models.EventModelwithfrom_event()/to_event()round-trip andmodel_json_schema().
0.1.0 — 2026-01-25
Added
- Core
Eventdataclass — frozen, validated, zero external dependencies. EventTypeenum — exhaustive registry of all 50+ first-party event types across 10 namespaces plus audit types.- ULID utilities —
generate(),validate(),extract_timestamp_ms(). Tagsdataclass — arbitrarystr → strmetadata.- JSON serialisation —
Event.to_dict(),Event.to_json(),Event.from_dict(),Event.from_json(). Event.validate()— full structural validation of all fields.is_registered(),validate_custom(),namespace_of()— event-type introspection helpers.- Domain exceptions hierarchy —
LLMSchemaErrorbase withSchemaValidationError,ULIDError,SerializationError,DeserializationError,EventTypeError.