Changelog¶
The full release history lives in CHANGELOG.md
at the repository root — this page mirrors it so visitors don't have
to leave the docs site.
Changelog¶
All notable changes to this project will be documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.0.0] — 2026-06-04¶
First public release.
Sieve is a transparent proxy that sits between your agent framework and your LLM endpoint. It rewrites bloated prompts into lean, on-demand context backed by an encrypted local memory store — without changing your agent or your endpoint.
Headline numbers¶
- 95% fewer tokens per turn, measured invariant across 5 LLM architectures, 8B–72B model sizes, 8K–64K context windows, and 1–64 concurrent sessions.
- 3–7× faster follow-ups on frontier models.
- Up to 9× less hallucination on absence-trap queries.
- Sub-15 ms recall at 100,000 facts with full production crypto.
Install¶
Then point your agent at http://127.0.0.1:11435 instead of your usual LLM
endpoint. That is the whole integration.
What ships¶
sieve-install— one-command first-run setup. Detects Anthropic / OpenAI / OpenAI-compatible / Ollama / custom providers; picks a model; downloads the embedding model; initialises the encrypted store; optionally enables autostart.sieve— day-to-day CLI. Start / stop / restart / status / demo / benchmark / update, plus subcommand groups forstore,config,key, andbackup.sieve wizard— state-aware interactive menu for day-two operations.sieve demo— sandboxed 6-turn scripted conversation that demonstrates fact extraction and the absence-signal trap.sieve benchmark— reproducible baseline-vs-Sieve comparison with multi-run aggregation and a shareable markdown report.sieve update— on-request PyPI check. Zero auto-telemetry.
Architecture¶
- FastAPI proxy on a configurable port (default
11435). - In-process FastEmbed (BAAI/bge-small-en-v1.5, ONNX Runtime, ~50 MB) for embeddings — no separate embedding service to run.
- Three-tier retrieval pipeline: fingerprint → vector → cross-encoder rerank. Query decomposition for multi-hop.
- Three-phase progressive-activation lifecycle (OBSERVE → ACCUMULATE → ACTIVATE) so cold-start behaviour doesn't degrade answer quality before the store has enough material.
- Absence-signal layer (on by default) that refuses to fabricate on facts not in the store.
- SQLCipher-encrypted local memory store with key rotation, encrypted backups, and schema versioning.
Privacy and licensing¶
- Zero telemetry. Sieve does not phone home. The only outbound traffic
is whatever your LLM endpoint already makes;
sieve updatetalks to PyPI only when you invoke it. - Local-first. Your conversation history and extracted facts never leave your machine.
- Apache License 2.0. Patent pending — UK patent application GB2608859.1 (filed 16 April 2026). See PATENT_NOTICE.
Compatibility¶
- Python 3.11, 3.12, 3.13
- Linux, macOS (Intel + Apple Silicon), Windows via WSL2
- Any OpenAI-compatible or Ollama endpoint