Skip to content

Sieve, mem0, Zep: three shapes of agent memory

Applies to: Sieve v1.0.x

If you're shopping for a memory layer for an LLM agent in 2026, three credible shapes are on the table: an SDK you call from application code (mem0), a managed platform you push conversations into (Zep), and a transparent proxy that sits in the traffic path (Sieve). They get compared as if they were interchangeable. They aren't — and the differences that matter are architectural, not benchmark decimals.

We build Sieve, so read this knowing where our incentives sit. In exchange: every claim about the other two links to their docs and repos, fetched and quoted on 2026-06-10, and we'll be plain about where each of them is the better choice.

The integration contract

The deepest difference between the three is who has to know the memory layer exists.

mem0 is an SDK with explicit calls. Your application calls memory.search() before each LLM turn, makes its own LLM call, then calls memory.add() afterwards. mem0's docs describe the add pipeline plainly: "You trigger this pipeline with a single add call". There is also an OpenAI-compatible client mode that wraps memory around chat-completion calls — though it's still a client you adopt in code, not a network-level interposition. mem0 ships Python and TypeScript SDKs, as a library, self-hosted server, or managed cloud.

Zep is a platform with a push/pull contract. You create users and threads, push each message in with thread.add_messages(), and pull a "context block" out with thread.get_user_context(), which your app then splices into its own prompt — the quickstart guide walks exactly that loop. The LLM call remains entirely yours. Server-side, Zep builds a temporal graph of what it ingests.

Sieve is a proxy. You change one base URL — 127.0.0.1:11434 becomes 127.0.0.1:11435 — and keep your code. Stripping repeated context, learning facts, and injecting relevant ones back happens in the traffic path; the client doesn't know the proxy is there. The trade-offs of that choice (and there are real ones) got their own post: Why Sieve.

None of these is "right." They encode different beliefs about where memory belongs: in your code, in a platform, or in the pipe.

What happens on a turn

mem0 runs LLM-based extraction over your messages: an LLM call pulls out facts, conflicts are resolved, and results land in a vector store. As of its April 2026 v3 algorithm, extraction is "single-pass ADD-only... one LLM call, no UPDATE/DELETE", and "Mem0 requires an LLM to function, with gpt-5-mini from OpenAI as the default". Worth knowing if you read older comparisons: graph memory was removed in v3 in favour of built-in entity linking, so "vector + graph" descriptions of mem0 are out of date.

Zep ingests "episodes" (messages, JSON, text) and maintains a temporal context graph — facts carry validity intervals, so it can represent "worked at X until March, then Y." That engine is Graphiti, open source under Apache 2.0, and genuinely interesting work. Running Graphiti yourself requires a graph database (Neo4j, FalkorDB, or Amazon Neptune) plus an LLM for ingestion — it "defaults to OpenAI for LLM inference and embedding," with local servers (Ollama, vLLM) supported.

Sieve does two jobs per turn. Outbound, it strips what the model has already seen — tool schemas, repeated instructions, stale history — before the payload leaves your machine. In the background it extracts durable facts using the same LLM endpoint you already configured and embeds them with a local model (no separate API key, no extra vendor). On later turns it injects only the facts the turn needs, and gates absence: a question about something the store has never seen should produce a refusal, not an invention.

The emphasis differs accordingly. mem0 and Zep are primarily recall systems — their benchmarks measure answer accuracy over long histories. Sieve treats payload reduction as a first-class goal alongside recall, because we think the per-turn token bill is the quieter, larger problem — the argument in The hidden cost of context.

Where your data lives

This one divides the field cleanly.

mem0: your choice. The OSS library defaults to a local Qdrant plus SQLite history, with ~20 vector backends in Python; the hosted platform keeps memories on their cloud. Self-hosting the server is supported and documented.

Zep: the cloud is the product. The self-hostable Community Edition was deprecated in April 2025 ("we've decided to stop maintaining and releasing Zep Community Edition"), with open-source effort concentrated on Graphiti. Zep Cloud operates in AWS us-west-2; enterprise tiers add bring-your-own-key and deploy-in-your-VPC options, and the platform holds SOC 2 Type II certification with HIPAA BAAs for enterprise customers.

Sieve: local only, by design. Facts live in a SQLCipher-encrypted SQLite file under ~/.sieve/, embeddings are computed locally, there is no account and no telemetry. If your LLM endpoint is local too, nothing leaves the machine. There is deliberately no cloud to trust — which also means no managed offering if you wanted one.

What they cost

As of 2026-06-10, from the vendors' own pricing pages:

  • mem0: OSS is Apache 2.0. The hosted platform has a free tier (10,000 add requests and 1,000 retrieval requests a month) with paid tiers from $19/month. Self-hosted, your real cost is the extraction LLM calls and the vector store.
  • Zep: credit-metered ingestion — "1 credit per Episode up to 350 bytes... 0 credits for retrieval, storage, threads, users, and graph storage". Free tier is 1,000 credits/month; the Flex tier is $104/month billed annually ($125 month-to-month) including 50,000 credits.
  • Sieve: Apache 2.0, no hosted tier, no metering. The cost is your own compute — the same endpoint that runs your agent runs the extraction.

The numbers everyone quotes

mem0's research page (updated May 2026) reports LoCoMo 92.5 and LongMemEval 94.4 at under 7,000 tokens per retrieval. Zep's research page reports LoCoMo 94.7% and LongMemEval 90.2% with sub-200ms retrieval.

Notice anything? Each vendor leads the other on one of the same two benchmarks. That's not cherry-picking by either of them so much as a property of the genre: different readers, judges, and harnesses produce different numbers on the same datasets, and every vendor naturally publishes the configuration that suits their architecture. Treat all such tables — including any we publish — as claims about a specific harness, not facts about the product.

Sieve's position: we don't currently publish cross-tool benchmark numbers, and we'd rather hand you the harness than the table. sieve benchmark runs a baseline-vs-Sieve comparison on your hardware with your model in five to ten minutes, and the demo's absence-trap turn is reproducible on any model you pull. When we do publish numbers, they link to methodology you can re-run.

Where each one wins

Pick mem0 when memory is a feature of your application logic — you want programmatic control over what gets remembered for which user, you're building multi-tenant SaaS, you already live in Python or TypeScript, and an extraction LLM call per add is acceptable. The ecosystem is the largest of the three (~58k GitHub stars as of June 2026) and the backend flexibility is real.

Pick Zep when you want memory as managed infrastructure — you have compliance requirements (SOC 2, HIPAA BAA), you want temporal reasoning over entity relationships, and you're happy with a cloud service in the loop. Graphiti alone is worth a look (~27k stars) if you want to run a temporal context graph yourself and don't mind operating Neo4j and paying for ingestion-time LLM calls.

Pick Sieve when the thing you're protecting is the client code and the data. You can't or won't modify the agent (closed tools, many heterogeneous clients, or just discipline about coupling); you want everything on disk encrypted and on your own machine; you care as much about the size of every outbound payload as about recall; and single-user-per-store matches your deployment — a personal agent, a workstation, one proxy per user.

Sieve is the wrong choice when you need a multi-tenant memory backend for a hosted product, graph-shaped queries over entity relationships, or a vendor to operate it for you. Those are mem0 and Zep's home turf, and we'd rather you pick them than bend Sieve into a shape it doesn't have. It's also the youngest project of the three — v1.0 shipped this month — and doesn't pretend otherwise.

The rubric, compressed

mem0 Zep Sieve
Shape SDK (+optional client wrapper) Managed platform Transparent proxy
Code changes add/search calls push/pull + prompt splice base URL only
Data lives your infra or their cloud their cloud (us-west-2)* your disk, encrypted
Extraction LLM required (default OpenAI) platform-side / required for Graphiti your existing endpoint
Primary goal recall accuracy recall + temporal graph payload reduction + recall + absence handling
License / price Apache 2.0 / free tier, from $19/mo Graphiti Apache 2.0 / from $104/mo Apache 2.0 / free

*Enterprise BYOK/BYOC options exist.

Memory for agents is young enough that the shapes haven't converged, and honest comparison beats benchmark arithmetic. Know which contract you're signing — code, platform, or pipe — and the rest of the decision mostly makes itself.


This post was drafted with AI assistance and reviewed by the Sieve maintainer before publication. Competitor facts were fetched from the linked primary sources on 2026-06-10 and quoted verbatim where shown; if we've misrepresented either project, open an issue and we'll correct it. Sieve is open source under Apache 2.0.