memoria/FORK.md
agent-admin 86c7dc9ded config: default timezone → America/Indiana/Indianapolis
Deployment is in Fort Wayne, IN (Eastern — most of central/eastern
Indiana uses America/Indiana/Indianapolis). Upstream defaulted to
America/Chicago which would have caused staleness-gate and daily-log
date rollover to fire an hour off from local wall-clock.

Override still available via MEMORIA_TZ env var for other deployments.
Unknown zone behaviour unchanged (warn + fall back to system local).

FORK.md env-var table + divergences section updated to match.
2026-04-24 17:56:32 -04:00

146 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Memoria — production fork of claude-memory-compiler
This repository is a hardened fork of [coleam00/claude-memory-compiler](https://github.com/coleam00/claude-memory-compiler).
Upstream is a fresh (2026-04-06) proof-of-concept; this fork adds the
patches needed to run it as the backing memory system for a production
Claude Code deployment.
Upstream's README still applies for the core architecture and workflow
(daily logs → LLM compiler → knowledge articles + index → SessionStart
injection). What's below is the delta.
---
## What this fork changes
### Data-integrity hardening
- **Atomic state writes.** `state.json` and `last-flush.json` are written
tmp-then-fsync-then-`os.replace`. A crash mid-write leaves the target
unchanged instead of truncating to a partial or empty JSON.
- **Corruption recovery.** On `json.JSONDecodeError`, the corrupt file is
moved aside to `<name>.bak-YYYYMMDDTHHMMSSZ`, a warning is logged, and
a default is returned. Prevents the silent full-recompile failure mode.
- **File-locked appends.** Daily-log writes go through
`fcntl.flock`-guarded append. Concurrent flush and pre-compact calls
serialize through the lock; well-formed entries never interleave.
- **SDK retry with backoff.** `run_flush()` retries up to 3 times on SDK
exceptions (2s, 4s delays). On final failure the context file is NOT
deleted and dedup state is NOT updated — the next flush retries cleanly
instead of swallowing the loss.
### Subprocess detachment
- `session-end.py` and `pre-compact.py` pass `start_new_session=True` to
`subprocess.Popen` on POSIX. `flush.py` runs in its own process group,
surviving CC's post-hook signals. Upstream omits this, causing
intermittent silent data loss when the flush subprocess is killed
mid-LLM-call.
### Scaling / prompt size
- **Index-only context.** `compile.py` and `query.py` no longer inline
every existing wiki article into the LLM prompt. The compiler receives
the index and uses the `Read` tool to fetch specific articles. Fixes
upstream issues #3/#5/#9 (prompt-size / cost explosion past ~50
articles).
- **Daily-log chunking.** `compile.py` splits oversized daily logs along
`### ` section boundaries before invoking the LLM. Threshold
`MAX_LOG_CHARS_PER_CHUNK` (default 100_000; override via
`MEMORIA_MAX_LOG_CHARS`). Partial failure keeps the log uncompiled so
the next run retries.
### Correctness
- **Aliased wikilinks.** `extract_wikilinks()` and `count_inbound_links()`
strip `|display` suffixes. Lint's broken-link, orphan, and
missing-backlink checks no longer produce false positives on aliased
forms (fixes upstream issues #7/#8).
- **QA/sources excluded from missing-backlink check.** Q&A articles
reference concepts without requiring reciprocal links — previously
every Q&A that cited a concept would trigger a spurious suggestion.
### Safety
- **Path-traversal guard.** `safe_article_path()` resolves a wikilink
slug inside `KNOWLEDGE_DIR` or returns `None`. `wiki_article_exists()`
uses this guard; LLM-authored slugs like `../../etc/passwd` cannot
escape the knowledge tree.
### Configurability
- **Timezone.** `TIMEZONE` (default `America/Indiana/Indianapolis`
Eastern time, Fort Wayne / central-eastern Indiana) is now wired
through `zoneinfo.ZoneInfo` and used by `now_iso()` / `today_iso()` /
`maybe_trigger_compilation()`. Override via `MEMORIA_TZ`. Unknown zones
log a warning and fall back to system local time. Upstream defaulted
to `America/Chicago`; we diverge to match deployment.
- **Compile trigger.** The upstream 6 PM hardcoded gate is replaced with
a staleness-based trigger: compile fires if the daily log changed AND
`MEMORIA_COMPILE_INTERVAL_MIN` minutes (default 60) have elapsed since
the last compile of that log. No more "wrote a log at 5:59 PM, never
auto-compiles."
- **Model routing.** Per-call-site model env vars:
- `MEMORIA_COMPILE_MODEL` (default `sonnet`)
- `MEMORIA_QUERY_MODEL` (default `sonnet`)
- `MEMORIA_LINT_MODEL` (default `sonnet`)
- Flush uses Haiku unconditionally (short summarization).
### Hygiene
- File-handle context manager in `maybe_trigger_compilation()` so
`compile.log` handle is always cleaned up even on Popen failure.
- Single `asyncio.run()` wrapping the compile loop (not per-file) to
avoid event-loop churn.
- `python-dotenv` removed from direct dependencies (was unused).
- MIT LICENSE added (upstream has none).
---
## Environment variables
| Var | Default | Purpose |
|-----|---------|---------|
| `MEMORIA_TZ` | `America/Indiana/Indianapolis` | Timezone for date/time operations |
| `MEMORIA_COMPILE_INTERVAL_MIN` | `60` | Minutes between auto-compile triggers |
| `MEMORIA_MAX_LOG_CHARS` | `100000` | Daily-log chunk threshold |
| `MEMORIA_COMPILE_MODEL` | `sonnet` | Model for `compile.py` |
| `MEMORIA_QUERY_MODEL` | `sonnet` | Model for `query.py` |
| `MEMORIA_LINT_MODEL` | `sonnet` | Model for `lint.py` contradiction check |
---
## Tests
```
uv sync --extra test
uv run pytest tests/ -v
```
The test suite covers:
- Atomic write behavior (including exception-path recovery)
- Concurrent locked append with 4 workers × 25 entries each
- JSON corruption recovery with `.bak` backup
- Wikilink parsing (bare + aliased forms)
- Path-traversal rejection (relative, absolute, null-byte, empty)
- Daily-log chunking (small, oversized, mixed-sizes, boundary-respect)
- `compile.py` state-on-failure (single-chunk failure and partial-chunk failure)
- Lint backlink rules (aliased forms, QA/sources exclusions, concept-to-concept symmetry)
All 29 tests pass as of the `fork:` commit series.
---
## Upstream sync
The `upstream` remote tracks `coleam00/claude-memory-compiler` for
reference. **Do not blindly `git pull upstream main`** — our patches
likely conflict. Review upstream changes, cherry-pick what's relevant,
re-test.
---
## License
MIT — see [LICENSE](LICENSE). Upstream has no license file; author has
stated FOSS-by-declaration intent.