Skip to content

Changelog

The full release history lives in CHANGELOG.md at the repository root — this page mirrors it so visitors don't have to leave the docs site.


Changelog

All notable changes to this project will be documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.0.0] — 2026-06-04

First public release.

Sieve is a transparent proxy that sits between your agent framework and your LLM endpoint. It rewrites bloated prompts into lean, on-demand context backed by an encrypted local memory store — without changing your agent or your endpoint.

Headline numbers

  • 95% fewer tokens per turn, measured invariant across 5 LLM architectures, 8B–72B model sizes, 8K–64K context windows, and 1–64 concurrent sessions.
  • 3–7× faster follow-ups on frontier models.
  • Up to 9× less hallucination on absence-trap queries.
  • Sub-15 ms recall at 100,000 facts with full production crypto.

Install

pipx install llm-sieve
sieve-install

Then point your agent at http://127.0.0.1:11435 instead of your usual LLM endpoint. That is the whole integration.

What ships

  • sieve-install — one-command first-run setup. Detects Anthropic / OpenAI / OpenAI-compatible / Ollama / custom providers; picks a model; downloads the embedding model; initialises the encrypted store; optionally enables autostart.
  • sieve — day-to-day CLI. Start / stop / restart / status / demo / benchmark / update, plus subcommand groups for store, config, key, and backup.
  • sieve wizard — state-aware interactive menu for day-two operations.
  • sieve demo — sandboxed 6-turn scripted conversation that demonstrates fact extraction and the absence-signal trap.
  • sieve benchmark — reproducible baseline-vs-Sieve comparison with multi-run aggregation and a shareable markdown report.
  • sieve update — on-request PyPI check. Zero auto-telemetry.

Architecture

  • FastAPI proxy on a configurable port (default 11435).
  • In-process FastEmbed (BAAI/bge-small-en-v1.5, ONNX Runtime, ~50 MB) for embeddings — no separate embedding service to run.
  • Three-tier retrieval pipeline: fingerprint → vector → cross-encoder rerank. Query decomposition for multi-hop.
  • Three-phase progressive-activation lifecycle (OBSERVE → ACCUMULATE → ACTIVATE) so cold-start behaviour doesn't degrade answer quality before the store has enough material.
  • Absence-signal layer (on by default) that refuses to fabricate on facts not in the store.
  • SQLCipher-encrypted local memory store with key rotation, encrypted backups, and schema versioning.

Privacy and licensing

  • Zero telemetry. Sieve does not phone home. The only outbound traffic is whatever your LLM endpoint already makes; sieve update talks to PyPI only when you invoke it.
  • Local-first. Your conversation history and extracted facts never leave your machine.
  • Apache License 2.0. Patent pending — UK patent application GB2608859.1 (filed 16 April 2026). See PATENT_NOTICE.

Compatibility

  • Python 3.11, 3.12, 3.13
  • Linux, macOS (Intel + Apple Silicon), Windows via WSL2
  • Any OpenAI-compatible or Ollama endpoint