Skip to content

Diagnostic response headers

Sieve attaches a small set of response headers on intercepted /api/chat and /v1/chat/completions requests for operational visibility. These are stable across releases — you can write scripts and monitoring that depends on them.

Header Meaning
X-Sieve-Phase Progressive-activation phase (OBSERVE / ACCUMULATE / ACTIVATE)
X-Sieve-Fact-Count Facts in the memory store at request time
X-Sieve-Inbound-Tokens Approximate token count of the inbound payload before Sieve's trim
X-Sieve-Outbound-Tokens Approximate token count sent upstream after trim
X-Sieve-Rounds Number of recall-tool iterations (0 = no tool call)
X-Sieve-Proxy-Us Sieve-side wall time in microseconds (excludes upstream LLM time)

When to read them

  • Sanity-checking that Sieve is actually trimming: X-Sieve-Inbound-Tokens vs X-Sieve-Outbound-Tokens shows the reduction per request.
  • Progressive-activation introspection: X-Sieve-Phase tells you whether Sieve is still in cold-start (OBSERVE/ACCUMULATE) or fully active (ACTIVATE).
  • Latency attribution: X-Sieve-Proxy-Us measures Sieve's own overhead so you can separate it from upstream model time.

When to enable deeper telemetry

If you need per-request metrics (retrieval precision, writer extraction counts, tool-call breakdowns), enable the built-in validation collector in sieve.yaml:

validation:
  enabled: true
  db_path: ~/.sieve/validation_metrics.db

One SQLite row is written per intercepted request. Default is off — turn it on only when you want the data, since it adds disk writes to the hot path.