Skip to content

spanforge.cache — Semantic Cache Engine

Module: spanforge.cache
Added in: 1.0.7

The semantic cache engine deduplicates LLM calls by comparing the cosine similarity of incoming prompts to previously cached prompts. When a prompt is similar enough (controlled by similarity_threshold) the cached response is returned immediately — no model call, no tokens spent.

All public names are re-exported from spanforge.cache and from the top- level spanforge namespace.


Quick example

from spanforge.cache import SemanticCache, InMemoryBackend, cached

# --- option A: explicit cache object ---
cache = SemanticCache(
    backend=InMemoryBackend(max_size=512),
    similarity_threshold=0.92,
    ttl_seconds=3600,
    namespace="my-app",
    emit_events=True,
)
cached_value = cache.get("What is spanforge?")
if cached_value is None:
    result = await my_llm_call(prompt)
    cache.set("What is spanforge?", result)

# --- option B: @cached decorator ---
@cached(threshold=0.92, ttl=3600, emit_events=True)
async def ask(prompt: str) -> str:
    return await my_llm_call(prompt)

SemanticCache

class SemanticCache(
    backend: CacheBackend | None = None,
    similarity_threshold: float = 0.92,
    ttl_seconds: int = 3600,
    namespace: str = "default",
    embedder: Callable[[str], list[float]] | None = None,
    max_size: int = 1024,
    emit_events: bool = True,
)

Constructor parameters

ParameterTypeDefaultDescription
backendCacheBackend | NoneNoneStorage backend; if None an InMemoryBackend(max_size) is created automatically
similarity_thresholdfloat0.92Minimum cosine similarity to count as a cache hit; range [0.0, 1.0]
ttl_secondsint3600Seconds before a cache entry is considered stale
namespacestr"default"Logical partition; entries from different namespaces never collide
embedderCallable[[str], list[float]] | NoneNoneCustom embedding function; defaults to a lightweight built-in TF-IDF encoder
max_sizeint1024Maximum capacity when creating the default InMemoryBackend
emit_eventsboolTrueEmit llm.cache.* events on every hit, miss, write, or eviction

Methods

SemanticCache.get(prompt)

def get(self, prompt: str) -> str | None

Compute the embedding for prompt, search for the nearest cached entry, and return the cached response string if the similarity is at or above the threshold. Returns None on a miss. Emits llm.cache.hit or llm.cache.miss when emit_events=True.

SemanticCache.set(prompt, value, tags=None)

def set(self, prompt: str, value: str, tags: list[str] | None = None) -> None

Store value in the backend keyed by the embedding of prompt. Optional tags can be used to group entries for bulk invalidation. Emits llm.cache.written when emit_events=True.

SemanticCache.invalidate_by_tag(tag)

def invalidate_by_tag(self, tag: str) -> int

Remove all entries whose tag list contains tag. Returns the number of entries removed. Each removal emits llm.cache.evicted with eviction_reason="manual_invalidation" when emit_events=True.

SemanticCache.invalidate_all()

def invalidate_all(self) -> int

Flush the entire namespace. Returns the number of entries removed.


@cached decorator

from spanforge.cache import cached

The @cached decorator is available in bare and with-arguments forms.

Bare form

@cached
async def ask(prompt: str) -> str: ...

Uses default SemanticCache settings: threshold=0.92, ttl=3600, namespace="default", InMemoryBackend.

With-arguments form

@cached(
    threshold: float = 0.92,
    ttl: int = 3600,
    namespace: str = "default",
    backend: CacheBackend | None = None,
    tags: list[str] | None = None,
    emit_events: bool = True,
)
async def ask(prompt: str) -> str: ...

How the cache key is derived

The first positional str argument (or the keyword argument named prompt, query, text, or message) is used as the cache key. If no qualifying argument is found, the entire repr(args, kwargs) string is used.

Sync support

@cached works on both def and async def functions. For sync functions the cache get/set operations are performed synchronously in the calling thread.


Backend classes

InMemoryBackend

class InMemoryBackend(max_size: int = 1024)

LRU in-process store. Thread-safe. Data is lost when the process exits.
Good for: development, tests, single-process deployments.

ParamTypeDefaultDescription
max_sizeint1024Max entries before LRU eviction

SQLiteBackend

class SQLiteBackend(db_path: str = "spanforge_cache.db")

Persistent store backed by stdlib sqlite3. No extra dependencies required. Safe for multi-threaded access within a single process. Data survives process restarts.

ParamTypeDefaultDescription
db_pathstr"spanforge_cache.db"Filesystem path to the SQLite database file

RedisBackend

class RedisBackend(
    host: str = "localhost",
    port: int = 6379,
    db: int = 0,
    prefix: str = "spanforge:",
)

Distributed store via the optional redis package. Suitable for multi-process deployments, containers, or serverless functions that share a Redis instance.

Requires: pip install redis

ParamTypeDefaultDescription
hoststr"localhost"Redis server hostname
portint6379Redis server port
dbint0Redis logical database index
prefixstr"spanforge:"Key prefix; useful when sharing a Redis instance

CacheEntry

@dataclass
class CacheEntry:
    key_hash: str
    value: str
    embedding: list[float]
    created_at: float          # Unix timestamp
    ttl_seconds: int
    namespace: str
    tags: list[str]
    similarity_score: float    # score from the lookup that produced this entry (hits only)

Returned by backend inspection methods. Not usually constructed by application code.


CacheBackendError

class CacheBackendError(SpanForgeError):
    backend: str    # e.g. "SQLiteBackend"
    reason: str

Raised when a backend operation fails (disk full, Redis connection refused, etc.). All backend errors are wrapped in CacheBackendError so callers can handle them without importing backend-specific exception classes.


Events emitted

When emit_events=True (the default), the following events are emitted using the globally configured exporter:

Event typePayload classCondition
llm.cache.hitCacheHitPayloadPrompt similarity ≥ threshold
llm.cache.missCacheMissPayloadPrompt similarity < threshold or cache empty
llm.cache.writtenCacheWrittenPayloadNew entry stored
llm.cache.evictedCacheEvictedPayloadEntry removed (TTL, LRU, or manual)

Payload dataclasses are in spanforge.namespaces.cache. See docs/namespaces/cache.md for field-by-field documentation.


See also