HallucCheck Integration Guide

DX-011 · Priority P0

This guide explains how to integrate SpanForge with HallucCheck to detect, flag, and audit hallucinated responses in LLM-powered pipelines.

Overview

HallucCheck provides a real-time hallucination scoring engine. SpanForge adds compliance-grade observability — immutable audit trails, PII redaction, trust scoring, and quality gates — around every HallucCheck evaluation.

Together they form a detect → redact → audit → gate feedback loop:

Detect — HallucCheck scores each LLM response for factual grounding.
Redact — SpanForge PII scanner strips personal data before persistence.
Audit — SpanForge appends a tamper-evident audit record.
Gate — SpanForge trust gate blocks responses that exceed the hallucination risk threshold.

Prerequisites

pip install spanforge[halluccheck]   # installs both packages

Set your project credentials:

export SPANFORGE_PROJECT_ID="my-project"
export SPANFORGE_API_KEY="sf_..."
export HALLUCCHECK_API_KEY="hc_..."

Quick Start

from spanforge.sdk import sf_observe, sf_pii, sf_audit, sf_gate
import halluccheck

# 1. Score the response
score = halluccheck.score(
    prompt="What is the capital of France?",
    response="The capital of France is Paris.",
    context=["France is a country in Europe whose capital is Paris."],
)

# 2. Build the event payload
event = {
    "halluc_score": score.value,
    "grounded": score.grounded,
    "prompt_hash": score.prompt_hash,
    "response_snippet": score.response[:200],
    "model": "gpt-4o",
}

# 3. Redact PII before persisting
redacted = sf_pii.redact(event)

# 4. Append to the immutable audit trail
sf_audit.append(redacted.event, schema_key="halluc_check_v1")

# 5. Emit an observability span
sf_observe.emit_span("halluc_check", attributes=redacted.event)

# 6. Gate — block if hallucination risk is critical
gate_result = sf_gate.evaluate("halluc-quality-gate", payload={
    "halluc_score": score.value,
    "grounded": score.grounded,
})
if gate_result.verdict != "PASS":
    raise RuntimeError(f"Hallucination gate failed: {gate_result.verdict}")

Gate YAML Configuration

Create a gate definition at gates/halluc-quality-gate.yaml:

gate_id: halluc-quality-gate
version: "1.0"
description: Block responses with high hallucination risk

rules:
  - field: halluc_score
    operator: ">="
    threshold: 0.7
    verdict: FAIL
    message: "Hallucination score {value} exceeds threshold 0.7"

  - field: grounded
    operator: "=="
    value: false
    verdict: FAIL
    message: "Response is not grounded in provided context"

Batch Evaluation

For offline evaluation of datasets:

import halluccheck
from spanforge.sdk import sf_pii, sf_audit

results = halluccheck.score_batch(dataset)

for result in results:
    event = {
        "halluc_score": result.value,
        "grounded": result.grounded,
        "row_id": result.row_id,
    }
    clean = sf_pii.redact(event)
    sf_audit.append(clean.event, schema_key="halluc_batch_v1")

print(f"Audited {len(results)} evaluations")

Trust Scorecard Integration

HallucCheck scores feed into the SpanForge Trust Scorecard's reliability dimension. No extra code is needed — the audit records are automatically consumed by the scorecard engine when schema_key starts with halluc_.

View your trust badge:

from spanforge.sdk import sf_trust

badge = sf_trust.get_badge()
print(f"Overall trust: {badge.overall:.0%} ({badge.colour_band})")

Compliance Evidence

Generate a compliance bundle that includes HallucCheck audit records:

from spanforge.sdk import sf_cec

bundle = sf_cec.build_bundle(
    project_id="my-project",
    date_range=("2025-01-01", "2025-06-30"),
)
print(f"Bundle: {bundle.bundle_id} — {bundle.zip_path}")

Testing with Mocks

Use the SpanForge mock library to test your HallucCheck integration without network calls:

from spanforge.testing_mocks import mock_all_services

def test_halluc_pipeline():
    with mock_all_services() as mocks:
        # ... run your pipeline ...
        mocks["sf_audit"].assert_called("append")
        mocks["sf_gate"].assert_called("evaluate")
        assert mocks["sf_pii"].call_count("redact") == 1

Troubleshooting

Symptom	Fix
`SFAuthError` on audit append	Check `SPANFORGE_API_KEY` is set and not expired
Gate always returns PASS	Verify `gates/halluc-quality-gate.yaml` exists and `halluc_score` field name matches
PII scanner misses entities	Run `spanforge doctor` to verify Presidio entity types are loaded
Trust scorecard missing reliability	Ensure `schema_key` starts with `halluc_`

Ready to instrument your AI pipeline?

Try the 30-second quickstart See the compliance checklist View on GitHub