Skip to content

llm.guard — Safety Classifier

Auto-documented module: spanforge.namespaces.guard

Field reference

FieldTypeDescription
classifierstrClassifier identifier (e.g. "openai-moderation", "llama-guard-2").
directionstr"input" or "output" — which side of the model was classified.
actionstrResult: "blocked", "passed", "flagged", "modified", or "escalated".
scorefloatClassifier confidence score.
score_minfloat | NoneMinimum of the scoring scale.
score_maxfloat | NoneMaximum of the scoring scale.
thresholdfloat | NoneBlock threshold applied.
categorieslist[str]All harm categories evaluated by the classifier.
triggered_categorieslist[str]Categories that exceeded the block threshold.
latency_msfloat | NoneClassifier latency in milliseconds.
policy_idstr | NonePolicy identifier that applied this guard.
span_idstr | NoneParent span identifier.
content_hashstr | NoneSHA-256 hash of the classified content (64 hex chars).

Example

from spanforge.namespaces.guard import GuardPayload

payload = GuardPayload(
    classifier="llama-guard-2",
    direction="input",
    action="blocked",
    score=0.91,
    categories=["violence", "self-harm"],
    triggered_categories=["self-harm"],
    threshold=0.8,
)