agent-admin 347d191935 fork: tests (29 green) + fork README + pytest config

Acceptance test suite under tests/ covers 8 of the 10 audit-defined
assertions directly (the 2 that require integration-level fixtures —
flush-subprocess-survives-hook-exit and whole-wiki-not-in-prompt
token-count — are documented as manual-test checks rather than
automated).

tests/test_fs_utils.py — 17 tests
  * Atomic write: roundtrip, overwrite, original-preserved-on-exception,
    parent-dir-creation.
  * Locked append: 4 concurrent workers × 25 entries each, asserts every
    entry appears exactly once and its body lines are contiguous. This
    is the acceptance criterion for "two concurrent flushes don't
    interleave writes."
  * JSON recovery: clean roundtrip, missing-file default, corruption
    produces timestamped .bak and returns default.
  * Wikilink parsing: bare / aliased / mixed; parse_wikilink strip.
  * Path safety: clean / traversal / absolute / empty / null-byte /
    aliased-but-safe.

tests/test_compile_chunking.py — 8 tests
  * Chunking: small log passthrough, byte-exact reconstruction,
    boundary respect, oversized-single-section, mixed-size packing.
  * State-on-failure: single-chunk SDK error does NOT update state;
    multi-chunk partial failure does NOT update state; all-chunks
    succeed DOES update state with hash + cost.

tests/test_lint_backlinks.py — 4 tests
  * Aliased wikilinks aren't flagged as broken links.
  * Aliased backlinks count as valid inbound references (the C9 fix).
  * QA articles referencing concepts don't trigger backlink suggestions.
  * Concept-to-concept asymmetry IS still reported (C9 scope is narrow).

FORK.md — fork-specific docs:
  * Summary of delta vs upstream (data-integrity, scaling, correctness,
    safety, configurability, hygiene categories)
  * Full env-var reference
  * Test invocation + coverage summary
  * Upstream sync guidance (cherry-pick, don't blind-pull)

Result: 29 passed in 0.07s. All patches in this fork verified via
automated test before any production use.

2026-04-24 17:54:00 -04:00

5.7 KiB

Raw Blame History

Memoria — production fork of claude-memory-compiler

This repository is a hardened fork of coleam00/claude-memory-compiler. Upstream is a fresh (2026-04-06) proof-of-concept; this fork adds the patches needed to run it as the backing memory system for a production Claude Code deployment.

Upstream's README still applies for the core architecture and workflow (daily logs → LLM compiler → knowledge articles + index → SessionStart injection). What's below is the delta.

What this fork changes

Data-integrity hardening

Atomic state writes. state.json and last-flush.json are written tmp-then-fsync-then-os.replace. A crash mid-write leaves the target unchanged instead of truncating to a partial or empty JSON.
Corruption recovery. On json.JSONDecodeError, the corrupt file is moved aside to <name>.bak-YYYYMMDDTHHMMSSZ, a warning is logged, and a default is returned. Prevents the silent full-recompile failure mode.
File-locked appends. Daily-log writes go through fcntl.flock-guarded append. Concurrent flush and pre-compact calls serialize through the lock; well-formed entries never interleave.
SDK retry with backoff. run_flush() retries up to 3 times on SDK exceptions (2s, 4s delays). On final failure the context file is NOT deleted and dedup state is NOT updated — the next flush retries cleanly instead of swallowing the loss.

Subprocess detachment

session-end.py and pre-compact.py pass start_new_session=True to subprocess.Popen on POSIX. flush.py runs in its own process group, surviving CC's post-hook signals. Upstream omits this, causing intermittent silent data loss when the flush subprocess is killed mid-LLM-call.

Scaling / prompt size

Index-only context. compile.py and query.py no longer inline every existing wiki article into the LLM prompt. The compiler receives the index and uses the Read tool to fetch specific articles. Fixes upstream issues #3/#5/#9 (prompt-size / cost explosion past ~50 articles).
Daily-log chunking. compile.py splits oversized daily logs along ### section boundaries before invoking the LLM. Threshold MAX_LOG_CHARS_PER_CHUNK (default 100_000; override via MEMORIA_MAX_LOG_CHARS). Partial failure keeps the log uncompiled so the next run retries.

Correctness

Aliased wikilinks. extract_wikilinks() and count_inbound_links() strip |display suffixes. Lint's broken-link, orphan, and missing-backlink checks no longer produce false positives on aliased forms (fixes upstream issues #7/#8).
QA/sources excluded from missing-backlink check. Q&A articles reference concepts without requiring reciprocal links — previously every Q&A that cited a concept would trigger a spurious suggestion.

Safety

Path-traversal guard. safe_article_path() resolves a wikilink slug inside KNOWLEDGE_DIR or returns None. wiki_article_exists() uses this guard; LLM-authored slugs like ../../etc/passwd cannot escape the knowledge tree.

Configurability

Timezone. TIMEZONE (default America/Chicago) is now wired through zoneinfo.ZoneInfo and used by now_iso() / today_iso() / maybe_trigger_compilation(). Override via MEMORIA_TZ. Unknown zones log a warning and fall back to system local time.
Compile trigger. The upstream 6 PM hardcoded gate is replaced with a staleness-based trigger: compile fires if the daily log changed AND MEMORIA_COMPILE_INTERVAL_MIN minutes (default 60) have elapsed since the last compile of that log. No more "wrote a log at 5:59 PM, never auto-compiles."
Model routing. Per-call-site model env vars:
- MEMORIA_COMPILE_MODEL (default sonnet)
- MEMORIA_QUERY_MODEL (default sonnet)
- MEMORIA_LINT_MODEL (default sonnet)
- Flush uses Haiku unconditionally (short summarization).

Hygiene

File-handle context manager in maybe_trigger_compilation() so compile.log handle is always cleaned up even on Popen failure.
Single asyncio.run() wrapping the compile loop (not per-file) to avoid event-loop churn.
python-dotenv removed from direct dependencies (was unused).
MIT LICENSE added (upstream has none).

Environment variables

Var	Default	Purpose
`MEMORIA_TZ`	`America/Chicago`	Timezone for date/time operations
`MEMORIA_COMPILE_INTERVAL_MIN`	`60`	Minutes between auto-compile triggers
`MEMORIA_MAX_LOG_CHARS`	`100000`	Daily-log chunk threshold
`MEMORIA_COMPILE_MODEL`	`sonnet`	Model for `compile.py`
`MEMORIA_QUERY_MODEL`	`sonnet`	Model for `query.py`
`MEMORIA_LINT_MODEL`	`sonnet`	Model for `lint.py` contradiction check

Tests

uv sync --extra test
uv run pytest tests/ -v

The test suite covers:

Atomic write behavior (including exception-path recovery)
Concurrent locked append with 4 workers × 25 entries each
JSON corruption recovery with .bak backup
Wikilink parsing (bare + aliased forms)
Path-traversal rejection (relative, absolute, null-byte, empty)
Daily-log chunking (small, oversized, mixed-sizes, boundary-respect)
compile.py state-on-failure (single-chunk failure and partial-chunk failure)
Lint backlink rules (aliased forms, QA/sources exclusions, concept-to-concept symmetry)

All 29 tests pass as of the fork: commit series.

Upstream sync

The upstream remote tracks coleam00/claude-memory-compiler for reference. Do not blindly git pull upstream main — our patches likely conflict. Review upstream changes, cherry-pick what's relevant, re-test.

License

MIT — see LICENSE. Upstream has no license file; author has stated FOSS-by-declaration intent.

5.7 KiB Raw Blame History Unescape Escape