ADR-007: User Feedback Collection Design
Status: Accepted
Date: 2024-01-15
Deciders: SpanForge Core Team
Context
LLM response quality cannot be assessed fully through automated metrics alone. Explicit user feedback — whether a thumbs-up, a star rating, or a free-text comment — is a primary signal for alignment and trust evaluation.
SpanForge needs a standard way to:
- Collect multi-modal user feedback (binary, ordinal, free-text).
- Correlate feedback records with specific LLM traces and sessions.
- Optionally link feedback to T.R.U.S.T. dimensions for scoring.
- Provide privacy guarantees over free-text content and user identifiers.
- Aggregate feedback into session summaries for monitoring dashboards.
The user feedback feature (Phase 13, F-21) must follow SpanForge conventions
for local-first operation (ADR-004), singleton clients (ADR-002), and the llm.*
event namespace.
Decision
Namespace
Introduce a new llm.feedback.* namespace under spanforge.namespaces.feedback:
| Event | Payload class | Trigger |
|---|---|---|
llm.feedback.submitted | FeedbackSubmittedPayload | sf_feedback.submit() |
llm.feedback.summary | FeedbackSummaryPayload | sf_feedback.get_summary() |
llm.feedback.trust_linked | (event dict) | sf_feedback.link_to_trust() |
Rating modalities
A FeedbackRating enum models all supported modalities:
| Modality | Values | Use case |
|---|---|---|
| Thumbs | THUMBS_UP, THUMBS_DOWN | Binary quick feedback |
| Star | STAR_1 – STAR_5 | 1–5 scale quality rating |
| Likert | LIKERT_1 – LIKERT_5 | Agreement / satisfaction scale |
| Free-text | FREE_TEXT | Open-ended qualitative comment |
All ordinal ratings expose numeric_value() → float | None normalised to [0.0, 1.0]
for uniform aggregation. FREE_TEXT returns None.
SDK client
A new SFFeedbackClient follows the SFServiceClient base class pattern (ADR-002)
and is registered as the sf_feedback module-level singleton.
Privacy
Free-text comments and user identifiers are privacy-sensitive. The decision is:
- Comment text: SHA-256 hashed before storage; raw text is never persisted. Field name is
comment_hashto make this explicit in the event schema. - User ID: SHA-256 hashed before storage; field name is
user_id_hash. - Session ID / Trace ID: Stored as-is. Callers must not embed PII in these identifiers.
SHA-256 is non-reversible and sufficient for event correlation without exposing identifiable content, consistent with the RAG tracing privacy model (ADR-006).
T.R.U.S.T. linkage
The five T.R.U.S.T. dimensions (transparency, reliability, user_trust,
security, traceability) can be enriched with direct user feedback signals.
The linked_trust_dimension field on FeedbackSubmittedPayload and the
link_to_trust() method on SFFeedbackClient emit a llm.feedback.trust_linked
event that the T.R.U.S.T. scorecard can consume.
Alternatives considered
A. Generic key-value feedback schema
Rejected. A free-form schema cannot provide typed numeric_value() aggregation
or validated T.R.U.S.T. dimension references. The enum-based approach enables
positive_rate and average score computation without post-hoc parsing.
B. Store raw comments with encryption at rest
Rejected. Encryption introduces key management complexity and does not eliminate the GDPR right-to-erasure obligation. Hashing is simpler, fully irreversible, and still enables deduplication and volume counting.
C. Separate feedback store independent of the SpanForge event bus
Rejected. Routing feedback through the standard SpanForge event pipeline (_emit_local)
ensures that all feedback records participate in existing export, audit, and
compliance pipelines. Bespoke storage would fragment observability data.
Consequences
Positive:
- Unified feedback collection with privacy-by-design.
- Numeric aggregation across all modalities via
FeedbackRating.numeric_value(). - Direct T.R.U.S.T. integration enables feedback-driven trust scoring.
- Session summaries (
get_summary()) provide real-time positive_rate metrics.
Negative:
- Free-text comments are not retrievable after submission — only the hash survives.
get_summary()computes aggregates in-process from the in-memory store; it does not query a remote backend. Production deployments should configure an export backend to persist feedback events.- Valid
linked_trust_dimensionvalues are validated at SDK level; invalid values raiseValueErrorat runtime.