Skip to content

Namespace: llm.rag.* — RAG Retrieval Payloads

Module: spanforge.namespaces.retrieval
Added in: 2.0.12 (Phase 13)
Events covered: llm.rag.query, llm.rag.retrieved, llm.rag.generated, llm.rag.session

This namespace covers all payload types emitted during a Retrieval-Augmented Generation (RAG) pipeline trace. Use spanforge.sdk.rag to emit these events without constructing the dataclasses directly.


Classes

ClassEventDescription
RetrievedChunk(value object)A single retrieved document chunk with relevance score
RetrievalQueryPayloadllm.rag.queryUser query + retriever configuration
RetrievalResultPayloadllm.rag.retrievedList of retrieved chunks from the vector store
RAGSpanPayloadllm.rag.generatedLLM generation span over retrieved context
RAGSessionPayloadllm.rag.sessionEnd-to-end RAG session summary

RetrievedChunk

A value object representing one retrieved document chunk. Used inside RetrievalResultPayload.chunks.

@dataclass
class RetrievedChunk:
    chunk_id: str
    content_hash: str
    score: float
    source: str = ""
    metadata: dict[str, Any] = field(default_factory=dict)
FieldTypeValidationDescription
chunk_idstrnon-emptyUnique identifier for the chunk within the document store
content_hashstrSHA-256 hex digest of the chunk text (raw text is never stored)
scorefloat[0.0, 1.0]Relevance / similarity score
sourcestrDocument source (e.g. URI, filename, database key)
metadatadictArbitrary key-value metadata attached to the chunk

Methods: to_dict() -> dict, from_dict(data: dict) -> RetrievedChunk


RetrievalQueryPayload

Payload for llm.rag.query events. Captures the user query and retriever configuration without storing raw query text.

@dataclass
class RetrievalQueryPayload:
    session_id: str
    query_hash: str
    top_k: int = 5
    retriever_name: str = ""
    embedding_model: str = ""
    namespace: str = ""
    latency_ms: float = 0.0
    filters: dict[str, Any] = field(default_factory=dict)
FieldTypeValidationDescription
session_idstrnon-emptyRAG session this query belongs to
query_hashstrSHA-256 hex digest of the query text (raw text NOT stored)
top_kint>= 1Number of chunks requested from the retriever
retriever_namestrName / identifier of the vector store or retriever
embedding_modelstrEmbedding model used to encode the query
namespacestrOptional vector store namespace / collection
latency_msfloat>= 0Time taken to submit the query (ms)
filtersdictMetadata filters applied to the retrieval query

Methods: to_dict() -> dict, from_dict(data: dict) -> RetrievalQueryPayload


RetrievalResultPayload

Payload for llm.rag.retrieved events. Contains the ordered list of chunks returned from the retriever.

@dataclass
class RetrievalResultPayload:
    session_id: str
    query_hash: str
    chunks: list[RetrievedChunk] = field(default_factory=list)
    total_found: int = 0
    latency_ms: float = 0.0
    status: Literal["ok", "partial", "error", "timeout"] = "ok"
    error_message: str | None = None
FieldTypeValidationDescription
session_idstrnon-emptyRAG session this result belongs to
query_hashstrSHA-256 digest of the triggering query
chunkslist[RetrievedChunk]Ordered list of retrieved chunks (top-k)
total_foundintTotal matching chunks before top_k truncation
latency_msfloat>= 0Retrieval latency in milliseconds
statusstrenum"ok", "partial", "error", or "timeout"
error_messagestr | NoneError detail when status is "error" or "timeout"

Methods: to_dict() -> dict, from_dict(data: dict) -> RetrievalResultPayload


RAGSpanPayload

Payload for llm.rag.generated events. Represents the LLM generation step that consumes retrieved context.

@dataclass
class RAGSpanPayload:
    session_id: str
    span_name: str
    model: str
    chunk_ids_used: list[str] = field(default_factory=list)
    context_tokens: int = 0
    prompt_tokens: int = 0
    output_tokens: int = 0
    latency_ms: float = 0.0
    status: Literal["ok", "error", "timeout"] = "ok"
    grounding_score: float | None = None
    error_message: str | None = None
FieldTypeValidationDescription
session_idstrnon-emptyRAG session this span belongs to
span_namestrHuman-readable label for this generation step
modelstrnon-emptyModel identifier (e.g. "gpt-4o")
chunk_ids_usedlist[str]Chunk IDs whose content was passed to the LLM
context_tokensintTokens consumed by the retrieved context
prompt_tokensintTotal prompt tokens (context + instruction)
output_tokensintTokens in the generated response
latency_msfloat>= 0Total generation latency in milliseconds
statusstrenum"ok", "error", or "timeout"
grounding_scorefloat | None[0.0, 1.0]How well the output is grounded in retrieved content
error_messagestr | NonePresent when status is not "ok"

Methods: to_dict() -> dict, from_dict(data: dict) -> RAGSpanPayload


RAGSessionPayload

Payload for llm.rag.session events. An aggregated summary of a complete RAG session, emitted when sf_rag.end_session() is called.

@dataclass
class RAGSessionPayload:
    session_id: str
    retriever_name: str
    total_queries: int = 0
    total_chunks_retrieved: int = 0
    unique_chunk_ids: list[str] = field(default_factory=list)
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    avg_grounding_score: float | None = None
    total_latency_ms: float = 0.0
    started_at: str = ""
    status: str = "ok"
FieldTypeDescription
session_idstrThe session identifier
retriever_namestrPrimary retriever used in this session
total_queriesintNumber of trace_query() calls in the session
total_chunks_retrievedintTotal chunks retrieved across all retrievals
unique_chunk_idslist[str]Deduplicated list of all chunk IDs referenced
total_input_tokensintSum of all prompt_tokens across generation spans
total_output_tokensintSum of all output_tokens across generation spans
avg_grounding_scorefloat | NoneMean grounding score; None if no scores recorded
total_latency_msfloatCombined latency across all spans in ms
started_atstrISO 8601 timestamp when the session was started
statusstrOverall session status ("ok" or "error")

Methods: to_dict() -> dict, from_dict(data: dict) -> RAGSessionPayload


Usage example

from spanforge.namespaces.retrieval import (
    RetrievedChunk,
    RetrievalQueryPayload,
    RetrievalResultPayload,
    RAGSpanPayload,
    RAGSessionPayload,
)

# Prefer the high-level client instead:
from spanforge.sdk import sf_rag
session_id = sf_rag.trace_query("What is entropy?", top_k=3)

Related