# Memoria — production fork of claude-memory-compiler This repository is a hardened fork of [coleam00/claude-memory-compiler](https://github.com/coleam00/claude-memory-compiler). Upstream is a fresh (2026-04-06) proof-of-concept; this fork adds the patches needed to run it as the backing memory system for a production Claude Code deployment. Upstream's README still applies for the core architecture and workflow (daily logs → LLM compiler → knowledge articles + index → SessionStart injection). What's below is the delta. --- ## What this fork changes ### Data-integrity hardening - **Atomic state writes.** `state.json` and `last-flush.json` are written tmp-then-fsync-then-`os.replace`. A crash mid-write leaves the target unchanged instead of truncating to a partial or empty JSON. - **Corruption recovery.** On `json.JSONDecodeError`, the corrupt file is moved aside to `.bak-YYYYMMDDTHHMMSSZ`, a warning is logged, and a default is returned. Prevents the silent full-recompile failure mode. - **File-locked appends.** Daily-log writes go through `fcntl.flock`-guarded append. Concurrent flush and pre-compact calls serialize through the lock; well-formed entries never interleave. - **SDK retry with backoff.** `run_flush()` retries up to 3 times on SDK exceptions (2s, 4s delays). On final failure the context file is NOT deleted and dedup state is NOT updated — the next flush retries cleanly instead of swallowing the loss. ### Subprocess detachment - `session-end.py` and `pre-compact.py` pass `start_new_session=True` to `subprocess.Popen` on POSIX. `flush.py` runs in its own process group, surviving CC's post-hook signals. Upstream omits this, causing intermittent silent data loss when the flush subprocess is killed mid-LLM-call. ### Scaling / prompt size - **Index-only context.** `compile.py` and `query.py` no longer inline every existing wiki article into the LLM prompt. The compiler receives the index and uses the `Read` tool to fetch specific articles. Fixes upstream issues #3/#5/#9 (prompt-size / cost explosion past ~50 articles). - **Daily-log chunking.** `compile.py` splits oversized daily logs along `### ` section boundaries before invoking the LLM. Threshold `MAX_LOG_CHARS_PER_CHUNK` (default 100_000; override via `MEMORIA_MAX_LOG_CHARS`). Partial failure keeps the log uncompiled so the next run retries. ### Correctness - **Aliased wikilinks.** `extract_wikilinks()` and `count_inbound_links()` strip `|display` suffixes. Lint's broken-link, orphan, and missing-backlink checks no longer produce false positives on aliased forms (fixes upstream issues #7/#8). - **QA/sources excluded from missing-backlink check.** Q&A articles reference concepts without requiring reciprocal links — previously every Q&A that cited a concept would trigger a spurious suggestion. ### Safety - **Path-traversal guard.** `safe_article_path()` resolves a wikilink slug inside `KNOWLEDGE_DIR` or returns `None`. `wiki_article_exists()` uses this guard; LLM-authored slugs like `../../etc/passwd` cannot escape the knowledge tree. ### Configurability - **Timezone.** `TIMEZONE` (default `America/Chicago`) is now wired through `zoneinfo.ZoneInfo` and used by `now_iso()` / `today_iso()` / `maybe_trigger_compilation()`. Override via `MEMORIA_TZ`. Unknown zones log a warning and fall back to system local time. - **Compile trigger.** The upstream 6 PM hardcoded gate is replaced with a staleness-based trigger: compile fires if the daily log changed AND `MEMORIA_COMPILE_INTERVAL_MIN` minutes (default 60) have elapsed since the last compile of that log. No more "wrote a log at 5:59 PM, never auto-compiles." - **Model routing.** Per-call-site model env vars: - `MEMORIA_COMPILE_MODEL` (default `sonnet`) - `MEMORIA_QUERY_MODEL` (default `sonnet`) - `MEMORIA_LINT_MODEL` (default `sonnet`) - Flush uses Haiku unconditionally (short summarization). ### Hygiene - File-handle context manager in `maybe_trigger_compilation()` so `compile.log` handle is always cleaned up even on Popen failure. - Single `asyncio.run()` wrapping the compile loop (not per-file) to avoid event-loop churn. - `python-dotenv` removed from direct dependencies (was unused). - MIT LICENSE added (upstream has none). --- ## Environment variables | Var | Default | Purpose | |-----|---------|---------| | `MEMORIA_TZ` | `America/Chicago` | Timezone for date/time operations | | `MEMORIA_COMPILE_INTERVAL_MIN` | `60` | Minutes between auto-compile triggers | | `MEMORIA_MAX_LOG_CHARS` | `100000` | Daily-log chunk threshold | | `MEMORIA_COMPILE_MODEL` | `sonnet` | Model for `compile.py` | | `MEMORIA_QUERY_MODEL` | `sonnet` | Model for `query.py` | | `MEMORIA_LINT_MODEL` | `sonnet` | Model for `lint.py` contradiction check | --- ## Tests ``` uv sync --extra test uv run pytest tests/ -v ``` The test suite covers: - Atomic write behavior (including exception-path recovery) - Concurrent locked append with 4 workers × 25 entries each - JSON corruption recovery with `.bak` backup - Wikilink parsing (bare + aliased forms) - Path-traversal rejection (relative, absolute, null-byte, empty) - Daily-log chunking (small, oversized, mixed-sizes, boundary-respect) - `compile.py` state-on-failure (single-chunk failure and partial-chunk failure) - Lint backlink rules (aliased forms, QA/sources exclusions, concept-to-concept symmetry) All 29 tests pass as of the `fork:` commit series. --- ## Upstream sync The `upstream` remote tracks `coleam00/claude-memory-compiler` for reference. **Do not blindly `git pull upstream main`** — our patches likely conflict. Review upstream changes, cherry-pick what's relevant, re-test. --- ## License MIT — see [LICENSE](LICENSE). Upstream has no license file; author has stated FOSS-by-declaration intent.