You can’t govern what
you can’t see.
AgentOBS is production observability for autonomous AI agents. Baseline behaviour, detect drift, enforce consent, and respond automatically — before regulators, users, or incident reports find the problem first.
See it in action.
Three scenarios. Three ways AgentOBS catches what your monitoring dashboards miss. Switch between tabs to explore — consent violations, behavioural drift, and confidence breaches.
These are representative examples. Real output varies by agent configuration and playbook definitions.
Everything production AI needs.
Behavioural baselining
Instrument your agent at first deployment. Every subsequent run is automatically compared against established baselines — output distributions, confidence scores, token patterns, and decision frequencies.
Drift detection
Statistical drift detection using configurable thresholds. When outputs start deviating from baseline, AgentOBS alerts before users notice. Z-score and KL-divergence metrics out of the box.
Consent boundary enforcement
Define exactly which data fields and sources your agent is permitted to access. AgentOBS monitors every decision for consent violations and escalates immediately when boundaries are breached.
Automated response playbooks
Pre-define runbooks for every alert type — pause the agent, escalate to a named responder, reroute to a fallback model, or log for later review. Playbooks execute in milliseconds.
Human-in-the-loop hooks
Low-confidence decisions are automatically queued for human approval before any output reaches users or downstream systems. Configurable confidence thresholds per decision type.
Immutable audit trail
Every decision, alert, playbook execution, and human review is logged with an immutable, timestamped record. Export-ready for regulators, auditors, and post-incident reviews.
Up and running in an afternoon.
Instrument
Add the AgentOBS SDK to your agent. One function call per decision point.
Baseline
Run your agent in staging. AgentOBS establishes the behavioural baseline automatically.
Deploy
Ship to production with confidence. AgentOBS monitors every decision in real time.
Respond
Alerts trigger playbooks. Humans are looped in exactly when needed — no more, no less.
Built for regulated, high-stakes AI.
Financial services
Credit decisions, fraud detection, customer communication agents.
Healthcare
Clinical decision support, triage routing, patient-facing assistants.
Legal & compliance
Contract analysis, regulatory monitoring, compliance automation.
Enterprise operations
Procurement automation, HR decision support, internal knowledge agents.
The complete AgentOBS stack.
From the open standard to the production SDK and developer tooling — every layer of the observability stack is documented and ready to use.
RFC-0001 AGENTOBS
The schema specification at the core of the ecosystem. Defines the event envelope, 10 observability namespaces, HMAC audit chains, and conformance profiles.
Read the standard →Python SDKagentobs
The reference implementation. pip-installable, zero required dependencies, covers all 10 namespaces with quickstart, integrations, and a CLI.
Explore the SDK →Developer ToolAgentOBSDebug
Inspect, replay, and visualise AgentOBS traces. Timeline views, span trees, tool-call analysis, cost attribution, and trace diffing.
Explore AgentOBSDebug →Compliance ToolAgentOBSValidate
Reference validation CLI and Python SDK. Validate JSON/JSONL event streams against the AGENTOBS schema, verify HMAC chains, and integrate with CI pipelines.
Explore AgentOBSValidate →Know what your AI is doing. Always.
AgentOBS is SpanForge's production observability layer for autonomous AI agents. Instrument, baseline, and govern your agents from day one.