Documentation navigation

llm.guard — Safety Classifier

Auto-documented module: spanforge.namespaces.guard

Field reference

Field	Type	Description
`classifier`	`str`	Classifier identifier (e.g. `"openai-moderation"`, `"llama-guard-2"`).
`direction`	`str`	`"input"` or `"output"` — which side of the model was classified.
`action`	`str`	Result: `"blocked"`, `"passed"`, `"flagged"`, `"modified"`, or `"escalated"`.
`score`	`float`	Classifier confidence score.
`score_min`	`float \| None`	Minimum of the scoring scale.
`score_max`	`float \| None`	Maximum of the scoring scale.
`threshold`	`float \| None`	Block threshold applied.
`categories`	`list[str]`	All harm categories evaluated by the classifier.
`triggered_categories`	`list[str]`	Categories that exceeded the block threshold.
`latency_ms`	`float \| None`	Classifier latency in milliseconds.
`policy_id`	`str \| None`	Policy identifier that applied this guard.
`span_id`	`str \| None`	Parent span identifier.
`content_hash`	`str \| None`	SHA-256 hash of the classified content (64 hex chars).

Example

from spanforge.namespaces.guard import GuardPayload

payload = GuardPayload(
    classifier="llama-guard-2",
    direction="input",
    action="blocked",
    score=0.91,
    categories=["violence", "self-harm"],
    triggered_categories=["self-harm"],
    threshold=0.8,
)

Ready to instrument your AI pipeline?

Try the 30-second quickstart See the compliance checklist View on GitHub