Skip to content

Tracing API

spanforge 2.0 ships a first-class Trace object and the start_trace() entry point that replaces the low-level implicit trace_id string. Everything in this guide builds on the async-safe contextvars context propagation also introduced in 2.0.


Quick start

import spanforge

spanforge.configure(exporter="console", service_name="my-agent")

with spanforge.start_trace("research-agent") as trace:
    with trace.llm_call("gpt-4o", temperature=0.7) as span:
        result = call_llm(prompt)
        span.set_token_usage(input=512, output=200, total=712)
        span.set_status("ok")

    with trace.tool_call("web_search") as span:
        output = run_search("latest AI news")
        span.set_status("ok")

trace.print_tree()

start_trace(agent_name, **attributes) -> Trace

Opens a new trace for one agent run. Under the hood it pushes a root AgentRunContextManager onto the contextvars stack so all child spans automatically inherit the trace_id.

trace = spanforge.start_trace(
    "my-agent",
    service_name="backend",   # overrides configure() for this trace
    env="production",
)

start_trace() supports both sync and async context-manager protocols:

# Sync
with spanforge.start_trace("my-agent") as trace:
    ...

# Async
async with spanforge.start_trace("my-agent") as trace:
    ...

# Imperative (manual close)
trace = spanforge.start_trace("my-agent")
try:
    ...
finally:
    trace.end()

Trace methods

trace.llm_call(model, *, temperature=None, top_p=None, max_tokens=None, **kwargs)

Opens a child span of type llm_call. The returned object is a SpanContextManager so it works as a context manager:

with trace.llm_call("gpt-4o", temperature=0.7, max_tokens=1024) as span:
    response = call_llm(...)
    span.set_token_usage(input=200, output=80, total=280)
    span.set_status("ok")

The temperature, top_p, and max_tokens values are stored in SpanPayload and appear in exported events.

trace.tool_call(tool_name, **kwargs)

Opens a child span of type tool_call:

with trace.tool_call("database_query") as span:
    rows = db.execute(query)
    span.set_status("ok")

trace.end()

Marks the trace as complete and flushes any pending spans. Called automatically when exiting the context manager.

trace.to_json(*, indent=None) -> str

Serialises the full trace (all accumulated spans) to a JSON string.

trace.save(path: str)

Writes the trace as NDJSON to the given file path.

trace.print_tree(*, file=None)

Pretty-prints the span tree to stdout (or file). Delegates to spanforge.debug.print_tree().

trace.summary() -> dict

Returns aggregated statistics. Delegates to spanforge.debug.summary().


Span.add_event(name, metadata=None)

Record a named, timestamped event at any point during a span's lifetime:

with trace.llm_call("gpt-4o") as span:
    span.add_event("prompt_rendered", {"template": "v3", "tokens": 200})
    result = call_llm(...)
    span.add_event("response_received", {"finish_reason": "stop"})
    span.set_status("ok")

Events are included in the exported SpanPayload.events list and survive to_dict() / from_dict() round-trips.


Error handling and categories

Span.record_error(exc, category="unknown_error")

from spanforge.types import SpanErrorCategory  # type alias

try:
    result = call_llm(prompt)
except TimeoutError as exc:
    span.record_error(exc, category="timeout_error")
    # error_category is set automatically for TimeoutError too
except Exception as exc:
    span.record_error(exc, category="llm_error")

Built-in auto-mapping:

Exception typeAuto-mapped category
TimeoutError, asyncio.TimeoutError"timeout_error"
Exceptions in LLM integration spans"llm_error"
Exceptions in tool call spans"tool_error"
All others"unknown_error"

Span.set_timeout_deadline(seconds: float)

Start a background timer that closes the span with status="timeout" if it is not closed within seconds:

with trace.llm_call("gpt-4o") as span:
    span.set_timeout_deadline(30.0)   # fail-safe: mark as timeout after 30 s
    result = await asyncio.wait_for(call_llm_async(prompt), timeout=30.0)
    span.set_status("ok")

Async context managers

All context managers support async with natively:

async with spanforge.start_trace("async-agent") as trace:
    async with trace.llm_call("gpt-4o") as span:
        response = await async_call_llm(prompt)
        span.set_status("ok")

Span stacks are stored in contextvars.ContextVar so two concurrent asyncio.gather() tasks each maintain their own independent span hierarchy.


Context propagation in threads

Use spanforge.copy_context() to carry the current span context into manually spawned threads or loop.run_in_executor callsites:

import concurrent.futures
import spanforge

ctx = spanforge.copy_context()

with concurrent.futures.ThreadPoolExecutor() as pool:
    future = pool.submit(ctx.run, my_function_that_creates_spans)

Raw tool I/O (opt-in)

By default, tool arguments and results are stored only as a SHA-256 hash. To store the raw strings (after redaction):

spanforge.configure(
    exporter="console",
    include_raw_tool_io=True,
    redaction_policy=policy,   # recommended when raw I/O is enabled
)

When enabled, ToolCall.arguments_raw and ToolCall.result_raw are populated. Any RedactionPolicy configured on the global config is applied to these values before storage.


Multi-agent workflows and cost rollup

Nested start_trace() contexts model multi-agent orchestration. When a child agent run exits, its total cost is automatically bubbled to the parent AgentRunContext so the parent's AgentRunPayload.total_cost reflects the full hierarchy:

import spanforge

spanforge.configure(exporter="jsonl", service_name="orchestrator")

with spanforge.start_trace("coordinator") as parent:
    # Coordinator's own LLM call
    with parent.llm_call("gpt-4o") as span:
        span.set_token_usage(input=200, output=100, total=300)
        span.set_status("ok")

    # Child agent — costs propagate to coordinator on exit
    with spanforge.start_trace("researcher") as child:
        with child.llm_call("claude-3-5-sonnet-20241022") as span:
            span.set_token_usage(input=800, output=400, total=1200)
            span.set_status("ok")

    # Another child agent
    with spanforge.start_trace("writer") as child:
        with child.llm_call("gpt-4o-mini") as span:
            span.set_token_usage(input=1000, output=500, total=1500)
            span.set_status("ok")

parent.print_tree()
# — Agent Run: coordinator  [3.4s]
#  ├─ LLM Call: gpt-4o  [0.5s]        in=200  out=100  $0.0015
#  ├─ Agent Run: researcher  [1.8s]
#  │   └─ LLM Call: claude-3-5-sonnet  [1.6s]  in=800  out=400  $0.0084
#  └─ Agent Run: writer  [1.1s]
#      └─ LLM Call: gpt-4o-mini  [0.9s] in=1000 out=500  $0.0005
#  Total (with children): $0.0104

Cost rollup is automatic — no manual wiring required. The AgentRunContextManager.__exit__ method detects the parent on the contextvars stack and calls parent.record_child_run_cost().

Per-run cost report

After exporting events to JSONL, use the CLI to inspect any run:

spanforge cost run --run-id 01JPXXXXXXXX --input events.jsonl

See CLI reference — cost run for details.