Self-evolving knowledge base for Claude Code — fork of coleam00/claude-memory-compiler, hardened for production use.
Find a file
agent-admin 347d191935 fork: tests (29 green) + fork README + pytest config
Acceptance test suite under tests/ covers 8 of the 10 audit-defined
assertions directly (the 2 that require integration-level fixtures —
flush-subprocess-survives-hook-exit and whole-wiki-not-in-prompt
token-count — are documented as manual-test checks rather than
automated).

tests/test_fs_utils.py — 17 tests
  * Atomic write: roundtrip, overwrite, original-preserved-on-exception,
    parent-dir-creation.
  * Locked append: 4 concurrent workers × 25 entries each, asserts every
    entry appears exactly once and its body lines are contiguous. This
    is the acceptance criterion for "two concurrent flushes don't
    interleave writes."
  * JSON recovery: clean roundtrip, missing-file default, corruption
    produces timestamped .bak and returns default.
  * Wikilink parsing: bare / aliased / mixed; parse_wikilink strip.
  * Path safety: clean / traversal / absolute / empty / null-byte /
    aliased-but-safe.

tests/test_compile_chunking.py — 8 tests
  * Chunking: small log passthrough, byte-exact reconstruction,
    boundary respect, oversized-single-section, mixed-size packing.
  * State-on-failure: single-chunk SDK error does NOT update state;
    multi-chunk partial failure does NOT update state; all-chunks
    succeed DOES update state with hash + cost.

tests/test_lint_backlinks.py — 4 tests
  * Aliased wikilinks aren't flagged as broken links.
  * Aliased backlinks count as valid inbound references (the C9 fix).
  * QA articles referencing concepts don't trigger backlink suggestions.
  * Concept-to-concept asymmetry IS still reported (C9 scope is narrow).

FORK.md — fork-specific docs:
  * Summary of delta vs upstream (data-integrity, scaling, correctness,
    safety, configurability, hygiene categories)
  * Full env-var reference
  * Test invocation + coverage summary
  * Upstream sync guidance (cherry-pick, don't blind-pull)

Result: 29 passed in 0.07s. All patches in this fork verified via
automated test before any production use.
2026-04-24 17:54:00 -04:00
.claude Claude Code Memory Compiler 2026-04-06 09:26:30 -05:00
hooks fork: MIT LICENSE + foundation patches (atomicity, locking, safety) 2026-04-24 17:44:07 -04:00
scripts fork: scaling fixes (index-only context + chunking + model wiring) 2026-04-24 17:48:48 -04:00
tests fork: tests (29 green) + fork README + pytest config 2026-04-24 17:54:00 -04:00
.gitignore URL change for repo in README 2026-04-06 14:46:55 -05:00
AGENTS.md Claude Code Memory Compiler 2026-04-06 09:26:30 -05:00
FORK.md fork: tests (29 green) + fork README + pytest config 2026-04-24 17:54:00 -04:00
LICENSE fork: MIT LICENSE + foundation patches (atomicity, locking, safety) 2026-04-24 17:44:07 -04:00
pyproject.toml fork: MIT LICENSE + foundation patches (atomicity, locking, safety) 2026-04-24 17:44:07 -04:00
README.md URL change for repo in README 2026-04-06 14:46:55 -05:00
uv.lock fork: tests (29 green) + fork README + pytest config 2026-04-24 17:54:00 -04:00

LLM Personal Knowledge Base

Your AI conversations compile themselves into a searchable knowledge base.

Adapted from Karpathy's LLM Knowledge Base architecture, but instead of clipping web articles, the raw data is your own conversations with Claude Code. When a session ends (or auto-compacts mid-session), Claude Code hooks capture the conversation transcript and spawn a background process that uses the Claude Agent SDK to extract the important stuff - decisions, lessons learned, patterns, gotchas - and appends it to a daily log. You then compile those daily logs into structured, cross-referenced knowledge articles organized by concept. Retrieval uses a simple index file instead of RAG - no vector database, no embeddings, just markdown.

Anthropic has clarified that personal use of the Claude Agent SDK is covered under your existing Claude subscription (Max, Team, or Enterprise) - no separate API credits needed. Unlike OpenClaw, which requires API billing for its memory flush, this runs on your subscription.

Quick Start

Tell your AI coding agent:

"Clone https://github.com/coleam00/claude-memory-compiler into this project. Set up the Claude Code hooks so my conversations automatically get captured into daily logs, compiled into a knowledge base, and injected back into future sessions. Read the AGENTS.md for the full technical reference on how everything works."

The agent will:

  1. Clone the repo and run uv sync to install dependencies
  2. Copy .claude/settings.json into your project (or merge the hooks into your existing settings)
  3. The hooks activate automatically next time you open Claude Code

From there, your conversations start accumulating. After 6 PM local time, the next session flush automatically triggers compilation of that day's logs into knowledge articles. You can also run uv run python scripts/compile.py manually at any time.

How It Works

Conversation -> SessionEnd/PreCompact hooks -> flush.py extracts knowledge
    -> daily/YYYY-MM-DD.md -> compile.py -> knowledge/concepts/, connections/, qa/
        -> SessionStart hook injects index into next session -> cycle repeats
  • Hooks capture conversations automatically (session end + pre-compaction safety net)
  • flush.py calls the Claude Agent SDK to decide what's worth saving, and after 6 PM triggers end-of-day compilation automatically
  • compile.py turns daily logs into organized concept articles with cross-references (triggered automatically or run manually)
  • query.py answers questions using index-guided retrieval (no RAG needed at personal scale)
  • lint.py runs 7 health checks (broken links, orphans, contradictions, staleness)

Key Commands

uv run python scripts/compile.py                    # compile new daily logs
uv run python scripts/query.py "question"            # ask the knowledge base
uv run python scripts/query.py "question" --file-back # ask + save answer back
uv run python scripts/lint.py                        # run health checks
uv run python scripts/lint.py --structural-only      # free structural checks only

Why No RAG?

Karpathy's insight: at personal scale (50-500 articles), the LLM reading a structured index.md outperforms vector similarity. The LLM understands what you're really asking; cosine similarity just finds similar words. RAG becomes necessary at ~2,000+ articles when the index exceeds the context window.

Technical Reference

See AGENTS.md for the complete technical reference: article formats, hook architecture, script internals, cross-platform details, costs, and customization options. AGENTS.md is designed to give an AI agent everything it needs to understand, modify, or rebuild the system.