Diagnostic response headers¶
Sieve attaches a small set of response headers on intercepted /api/chat
and /v1/chat/completions requests for operational visibility. These
are stable across releases — you can write scripts and monitoring that
depends on them.
| Header | Meaning |
|---|---|
X-Sieve-Phase |
Progressive-activation phase (OBSERVE / ACCUMULATE / ACTIVATE) |
X-Sieve-Fact-Count |
Facts in the memory store at request time |
X-Sieve-Inbound-Tokens |
Approximate token count of the inbound payload before Sieve's trim |
X-Sieve-Outbound-Tokens |
Approximate token count sent upstream after trim |
X-Sieve-Rounds |
Number of recall-tool iterations (0 = no tool call) |
X-Sieve-Proxy-Us |
Sieve-side wall time in microseconds (excludes upstream LLM time) |
When to read them¶
- Sanity-checking that Sieve is actually trimming:
X-Sieve-Inbound-TokensvsX-Sieve-Outbound-Tokensshows the reduction per request. - Progressive-activation introspection:
X-Sieve-Phasetells you whether Sieve is still in cold-start (OBSERVE/ACCUMULATE) or fully active (ACTIVATE). - Latency attribution:
X-Sieve-Proxy-Usmeasures Sieve's own overhead so you can separate it from upstream model time.
When to enable deeper telemetry¶
If you need per-request metrics (retrieval precision, writer extraction
counts, tool-call breakdowns), enable the built-in validation collector
in sieve.yaml:
One SQLite row is written per intercepted request. Default is off — turn it on only when you want the data, since it adds disk writes to the hot path.