Skip to content

spanforge.sdk.rag — RAG Tracing Client

Module: spanforge.sdk.rag
Added in: 2.0.12 (Phase 13 — RAG Tracing & User Feedback)
Import: from spanforge.sdk import sf_rag or from spanforge.sdk.rag import SFRAGClient

spanforge.sdk.rag provides end-to-end tracing for Retrieval-Augmented Generation pipelines. It records query, retrieval, and generation spans with correlation across an entire RAG session, without storing raw query text or retrieved document content.


Quick example

from spanforge.sdk import sf_rag

# 1. Start a RAG session (returns a session_id to thread through)
session_id = sf_rag.trace_query(
    query="What is the capital of France?",
    top_k=5,
    retriever_name="chroma-main",
)

# 2. Record retrieval results
sf_rag.trace_retrieval(
    session_id,
    chunks=[
        {"chunk_id": "doc-42-p3", "score": 0.93, "content_hash": "abc", "source": "docs/geo.md"},
    ],
    total_found=12,
    latency_ms=45.2,
)

# 3. Record the generation span
sf_rag.trace_generation(
    session_id,
    "gpt-4o",
    chunk_ids_used=["doc-42-p3"],
    prompt_tokens=512,
    output_tokens=128,
    grounding_score=0.91,
    latency_ms=1230.0,
)

# 4. Finalise the session and get an aggregated summary
summary = sf_rag.end_session(session_id)
print(summary.total_queries)        # 1
print(summary.avg_grounding_score)  # 0.91
print(summary.status)               # "ok"

Singleton

spanforge.sdk.sf_rag is a module-level SFRAGClient instance. Import and use it directly for most use-cases:

from spanforge.sdk import sf_rag

To construct a custom instance:

from spanforge.sdk.rag import SFRAGClient
from spanforge.sdk._base import SFClientConfig

client = SFRAGClient(SFClientConfig(api_key="..."))

Security model

DataStored as
Raw query textNever stored — SHA-256 hash only
Retrieved document textNever stored
Chunk IDs (chunk_id)Stored as-is — callers must not include PII
Grounding scoresStored as floats
Token countsStored as integers

RAGStatusInfo

@dataclass
class RAGStatusInfo:
    status: str
    active_sessions: int
    total_queries: int
    total_spans: int

Returned by SFRAGClient.get_status().

FieldDescription
status"ok" or "degraded"
active_sessionsNumber of sessions started but not yet finalised with end_session()
total_queriesTotal trace_query() calls in this process lifetime
total_spansTotal trace_generation() calls in this process lifetime

SFRAGClient

class SFRAGClient(SFServiceClient)

Thread-safe RAG tracing service client.

Constructor

SFRAGClient(config: SFClientConfig)

trace_query(query, *, session_id=None, top_k=5, retriever_name="", embedding_model="", namespace="", filters=None) -> str

Record a RAG query event and start a new session.

ParameterTypeDefaultDescription
querystr(required)The user query text. SHA-256 hashed — raw text is NOT stored.
session_idstr | NoneNoneCaller-supplied session ID. Auto-generated (ULID) if omitted.
top_kint5Number of chunks requested from the retriever.
retriever_namestr""Name/identifier of the vector store or retriever.
embedding_modelstr""Embedding model used to encode the query.
namespacestr""Optional vector store namespace / collection.
filtersdict | NoneNoneMetadata filters applied to the retrieval query.

Returns: str — the session_id to pass to subsequent calls.


trace_retrieval(session_id, *, chunks, total_found=0, latency_ms=0.0, status="ok", error_message=None) -> None

Record retrieval results for an active session.

ParameterTypeDefaultDescription
session_idstr(required)Session ID returned by trace_query().
chunkslist[dict](required)List of chunk dicts, each with chunk_id, score, content_hash, and source.
total_foundint0Total matching chunks before top_k truncation.
latency_msfloat0.0Retrieval latency in milliseconds.
statusstr"ok""ok", "partial", "error", or "timeout".
error_messagestr | NoneNoneError detail when status is "error" or "timeout".

Note: Unknown session_id values are silently ignored (no error raised).


trace_generation(session_id, model, *, chunk_ids_used, prompt_tokens=0, output_tokens=0, context_tokens=0, grounding_score=None, latency_ms=0.0, status="ok", error_message=None, span_name="generation") -> None

Record a generation span linked to retrieved chunks.

ParameterTypeDefaultDescription
session_idstr(required)Session ID returned by trace_query().
modelstr(required)LLM model identifier (e.g. "gpt-4o").
chunk_ids_usedlist[str](required)Chunk IDs whose content was passed to the LLM.
prompt_tokensint0Number of input tokens.
output_tokensint0Number of output tokens.
context_tokensint0Total context tokens (retrieved chunk content).
grounding_scorefloat | NoneNoneGrounding score in [0.0, 1.0]. Measures how well the answer is grounded in retrieved content.
latency_msfloat0.0Generation latency in milliseconds.
statusstr"ok""ok", "error", or "timeout".
error_messagestr | NoneNoneError detail when status is not "ok".
span_namestr"generation"Label for this generation span.

Note: Unknown session_id values are silently ignored.


end_session(session_id) -> RAGSessionPayload

Finalise a RAG session and return an aggregated summary.

ParameterTypeDescription
session_idstrActive session to close.

Returns: RAGSessionPayload — summary of all queries, chunks used, token counts, and average grounding score.

Raises: KeyError — if session_id has not been started or was already ended.


get_session(session_id) -> RAGSessionPayload | None

Return an in-progress session summary without ending it.

Returns: RAGSessionPayload if the session exists, None otherwise.


get_status() -> RAGStatusInfo

Return service health and session statistics.

status = sf_rag.get_status()
print(status.status)           # "ok"
print(status.active_sessions)  # 3
print(status.total_queries)    # 47

@trace_rag decorator (F-20)

Added in: 2.0.14 — lives in spanforge.auto; documented here for discoverability.

from spanforge.auto import trace_rag

@trace_rag
def my_retriever(query: str) -> list[dict]:
    ...

The @trace_rag decorator wraps any callable retrieval function and emits the same RAG tracing spans (trace_query + trace_retrieval) that the auto-instrumentation patch emits for LlamaIndex and LangChain.

Use this decorator when:

  • You have a custom retrieval function (not backed by LlamaIndex or LangChain).
  • You want explicit, per-function instrumentation rather than global monkey-patching via spanforge.auto.setup().
from spanforge.auto import trace_rag

@trace_rag
def search(query: str) -> list[dict]:
    return vector_db.search(query, top_k=5)

results = search("What is the T.R.U.S.T. score threshold?")
# RAG query + retrieval spans emitted automatically

All tracing is best-effort — any failure inside sf_rag is silently swallowed so the decorated function always runs normally.

For the full decorator reference, see spanforge.auto.


Related