Skip to content

Metrics & Analytics

spanforge 2.0 ships two complementary APIs for extracting insights from recorded traces:

  • spanforge.metrics — batch aggregation over any Iterable[Event]
  • spanforge._store.TraceStore — in-process ring buffer for real-time programmatic access

spanforge.metrics

aggregate(events) -> MetricsSummary

Compute a comprehensive summary from any iterable of Event objects:

import spanforge
from spanforge.stream import EventStream

# From a JSONL file
events  = list(EventStream.from_file("events.jsonl"))
summary = spanforge.metrics.aggregate(events)

print(f"Traces  : {summary.trace_count}")
print(f"Success : {summary.agent_success_rate:.0%}")
print(f"p50 LLM : {summary.llm_latency_ms.p50:.0f} ms")
print(f"p95 LLM : {summary.llm_latency_ms.p95:.0f} ms")
print(f"Tokens  : {summary.total_input_tokens} in / {summary.total_output_tokens} out")
print(f"Cost    : ${summary.total_cost_usd:.4f}")
print(f"By model: {summary.cost_by_model}")

MetricsSummary fields

FieldTypeDescription
trace_countintNumber of distinct traces
span_countintTotal span count across all traces
agent_success_ratefloatFraction of traces with no error spans (0–1)
avg_trace_duration_msfloatMean end-to-end trace duration in ms
p50_trace_duration_msfloatMedian trace duration
p95_trace_duration_msfloat95th-percentile trace duration
total_input_tokensintSum of input_tokens across all LLM spans
total_output_tokensintSum of output_tokens across all LLM spans
total_cost_usdfloatSum of cost_usd across all spans
llm_latency_msLatencyStatsmin, max, p50, p95, p99 for LLM spans
tool_failure_ratefloatFraction of tool spans with status="error"
token_usage_by_modeldict[str, TokenUsage]Per-model input/output/total token sums
cost_by_modeldict[str, float]Per-model USD totals

Single-metric helpers

from spanforge import metrics

success   = metrics.agent_success_rate(events)    # float
latency   = metrics.llm_latency(events)           # LatencyStats
fail_rate = metrics.tool_failure_rate(events)      # float
by_model  = metrics.token_usage(events)           # dict[str, TokenUsage]

All helpers accept the same Iterable[Event] as aggregate().


TraceStore — in-process ring buffer

Enable the trace store to query recent traces programmatically without reading any files:

spanforge.configure(
    exporter="console",
    enable_trace_store=True,
    trace_store_size=200,   # keep the last 200 traces (default: 100)
)

Querying the store

# Get all events for a specific trace
events = spanforge.get_trace(trace_id)

# Get the most recently completed agent run
events = spanforge.get_last_agent_run()

# Get all tool-call spans in a trace
tool_spans = spanforge.list_tool_calls(trace_id)

# Get all LLM spans in a trace
llm_spans  = spanforge.list_llm_calls(trace_id)

Or use the TraceStore object directly:

from spanforge._store import get_store

store = get_store()
events = store.get_trace(trace_id)
store.clear()

Environment variable

export SPANFORGE_ENABLE_TRACE_STORE=1

Security note

The store holds event payloads in memory. When a RedactionPolicy is configured (spanforge.configure(redaction_policy=...)), events are redacted before being passed to store.record(). Raw PII never enters the ring buffer.

Memory overhead is bounded: trace_store_size × average_events_per_trace. At the default of 100 traces and ~20 events per trace, the store holds at most ~2 000 Event objects.