fork: scaling fixes (index-only context + chunking + model wiring)

Fixes upstream issues #3/#5/#9 (whole-wiki in every prompt) and adds
large-log chunking. Addresses the audit's P1 scaling findings (C1),
the chunking requirement operator added on top, C8 explicit model
wiring across all LLM call sites, and D3 single-event-loop refactor.

## compile.py

- **Index-only context.** The `existing_articles_context` concatenation
  of every wiki article has been removed from the prompt. Instead the
  LLM receives only the index + schema + daily log and uses the Read
  tool (already in allowed_tools) to fetch specific articles it decides
  are relevant. Prompt size stays bounded regardless of KB growth —
  upstream's 250K-token prompts past ~100 articles are gone.

- **Chunking.** `_split_log_into_chunks()` splits oversized daily logs
  along `### ` section boundaries. Threshold MAX_LOG_CHARS_PER_CHUNK
  (default 100K chars ≈ 25K tokens, configurable via
  MEMORIA_MAX_LOG_CHARS). Chunks compile via separate LLM calls that
  naturally merge through Edit on shared files. Oversized single
  sections emit as their own chunks rather than splitting mid-thought.

- **Atomic state on chunked compile.** State is only written after
  ALL chunks succeed — partial-failure leaves the log flagged as
  uncompiled in state.json so the next run retries it cleanly. Was
  already correct for single-chunk logs (early return on SDK error)
  and now correct for multi-chunk too.

- **Explicit model.** `model=COMPILE_MODEL` passed to
  ClaudeAgentOptions. Default "sonnet"; override via
  MEMORIA_COMPILE_MODEL env var.

- **D3: single asyncio.run.** The per-file `asyncio.run()` in the
  compile loop is replaced with one outer call wrapping `_compile_all`.
  Avoids repeated event-loop setup/teardown and matches the pattern
  used for async resources in the SDK.

## query.py

- **Index-only context.** `read_all_wiki_content()` replaced with
  `read_wiki_index()`. The LLM reads the index and uses its Read tool
  to fetch specific articles. Same rationale as compile.py — keeps
  prompt size bounded and cost predictable.

- **Explicit model.** `model=QUERY_MODEL`, default "sonnet", override
  via MEMORIA_QUERY_MODEL.

## lint.py

- **C9: skip qa/sources in missing-backlink check.** Articles under
  qa/ or sources/ no longer trigger a suggestion that every referenced
  concept should backlink to them. Concepts aren't expected to link
  back to every Q&A that mentions them — doing so would drown real
  relationships.

- **Alias-aware backlink detection.** Uses `extract_wikilinks()` to
  parse the target's link list so `[[concepts/foo|Display]]` forms
  count as valid backlinks (previously required exact `[[foo]]` match,
  causing false positives on aliased forms).

- **Explicit model.** `model=LINT_MODEL` in check_contradictions call,
  default "sonnet", override via MEMORIA_LINT_MODEL.

## Verified

- Chunking: 120K-char 3-section log splits into 80K + 40K, reconstructs
  byte-exact. Oversized single section (150K) emits as its own chunk.
  Small log (<100K) returns as single chunk.
- All patched modules import cleanly with expected config values.
- compile_daily_log / query.run_query / flush.maybe_trigger_compilation
  / lint.check_missing_backlinks all callable post-patch.
This commit is contained in:
agent-admin 2026-04-24 17:48:48 -04:00
parent 39ab2a8b6f
commit 03296be47a
3 changed files with 213 additions and 68 deletions

View file

@ -14,16 +14,28 @@ from __future__ import annotations
import argparse
import asyncio
import os
from pathlib import Path
from config import KNOWLEDGE_DIR, QA_DIR, now_iso
from utils import load_state, read_all_wiki_content, save_state
from utils import load_state, read_wiki_index, save_state
ROOT_DIR = Path(__file__).resolve().parent.parent
# Query model (Sonnet by default — synthesis over the retrieved articles
# benefits from strong reasoning; override via MEMORIA_QUERY_MODEL).
QUERY_MODEL = os.environ.get("MEMORIA_QUERY_MODEL", "sonnet")
async def run_query(question: str, file_back: bool = False) -> str:
"""Query the knowledge base and optionally file the answer back."""
"""Query the knowledge base and optionally file the answer back.
Unlike upstream, we do NOT inline the entire wiki into the prompt the
LLM receives the index only and uses its Read tool to fetch articles
it decides are relevant. Keeps prompt size bounded regardless of
knowledge-base size and avoids the whole-wiki-in-prompt cost wall
documented in upstream issues #3/#5/#9.
"""
from claude_agent_sdk import (
AssistantMessage,
ClaudeAgentOptions,
@ -32,7 +44,7 @@ async def run_query(question: str, file_back: bool = False) -> str:
query,
)
wiki_content = read_all_wiki_content()
wiki_index = read_wiki_index()
tools = ["Read", "Glob", "Grep"]
if file_back:
@ -59,20 +71,23 @@ After answering, do the following:
"""
prompt = f"""You are a knowledge base query engine. Answer the user's question by
consulting the knowledge base below.
consulting the knowledge base.
## How to Answer
1. Read the INDEX section first - it lists every article with a one-line summary
2. Identify 3-10 articles that are relevant to the question
3. Read those articles carefully (they're included below)
3. Use the Read tool to fetch those articles (they live at
{KNOWLEDGE_DIR}/concepts/, {KNOWLEDGE_DIR}/connections/, and
{KNOWLEDGE_DIR}/qa/). Only read articles you actually need do not
read the entire wiki.
4. Synthesize a clear, thorough answer
5. Cite your sources using [[wikilinks]] (e.g., [[concepts/supabase-auth]])
6. If the knowledge base doesn't contain relevant information, say so honestly
## Knowledge Base
## Knowledge Base Index
{wiki_content}
{wiki_index}
## Question
@ -87,6 +102,7 @@ consulting the knowledge base below.
prompt=prompt,
options=ClaudeAgentOptions(
cwd=str(ROOT_DIR),
model=QUERY_MODEL,
system_prompt={"type": "preset", "preset": "claude_code"},
allowed_tools=tools,
permission_mode="acceptEdits",