memoria/scripts/query.py
agent-admin 03296be47a fork: scaling fixes (index-only context + chunking + model wiring)
Fixes upstream issues #3/#5/#9 (whole-wiki in every prompt) and adds
large-log chunking. Addresses the audit's P1 scaling findings (C1),
the chunking requirement operator added on top, C8 explicit model
wiring across all LLM call sites, and D3 single-event-loop refactor.

## compile.py

- **Index-only context.** The `existing_articles_context` concatenation
  of every wiki article has been removed from the prompt. Instead the
  LLM receives only the index + schema + daily log and uses the Read
  tool (already in allowed_tools) to fetch specific articles it decides
  are relevant. Prompt size stays bounded regardless of KB growth —
  upstream's 250K-token prompts past ~100 articles are gone.

- **Chunking.** `_split_log_into_chunks()` splits oversized daily logs
  along `### ` section boundaries. Threshold MAX_LOG_CHARS_PER_CHUNK
  (default 100K chars ≈ 25K tokens, configurable via
  MEMORIA_MAX_LOG_CHARS). Chunks compile via separate LLM calls that
  naturally merge through Edit on shared files. Oversized single
  sections emit as their own chunks rather than splitting mid-thought.

- **Atomic state on chunked compile.** State is only written after
  ALL chunks succeed — partial-failure leaves the log flagged as
  uncompiled in state.json so the next run retries it cleanly. Was
  already correct for single-chunk logs (early return on SDK error)
  and now correct for multi-chunk too.

- **Explicit model.** `model=COMPILE_MODEL` passed to
  ClaudeAgentOptions. Default "sonnet"; override via
  MEMORIA_COMPILE_MODEL env var.

- **D3: single asyncio.run.** The per-file `asyncio.run()` in the
  compile loop is replaced with one outer call wrapping `_compile_all`.
  Avoids repeated event-loop setup/teardown and matches the pattern
  used for async resources in the SDK.

## query.py

- **Index-only context.** `read_all_wiki_content()` replaced with
  `read_wiki_index()`. The LLM reads the index and uses its Read tool
  to fetch specific articles. Same rationale as compile.py — keeps
  prompt size bounded and cost predictable.

- **Explicit model.** `model=QUERY_MODEL`, default "sonnet", override
  via MEMORIA_QUERY_MODEL.

## lint.py

- **C9: skip qa/sources in missing-backlink check.** Articles under
  qa/ or sources/ no longer trigger a suggestion that every referenced
  concept should backlink to them. Concepts aren't expected to link
  back to every Q&A that mentions them — doing so would drown real
  relationships.

- **Alias-aware backlink detection.** Uses `extract_wikilinks()` to
  parse the target's link list so `[[concepts/foo|Display]]` forms
  count as valid backlinks (previously required exact `[[foo]]` match,
  causing false positives on aliased forms).

- **Explicit model.** `model=LINT_MODEL` in check_contradictions call,
  default "sonnet", override via MEMORIA_LINT_MODEL.

## Verified

- Chunking: 120K-char 3-section log splits into 80K + 40K, reconstructs
  byte-exact. Oversized single section (150K) emits as its own chunk.
  Small log (<100K) returns as single chunk.
- All patched modules import cleanly with expected config values.
- compile_daily_log / query.run_query / flush.maybe_trigger_compilation
  / lint.check_missing_backlinks all callable post-patch.
2026-04-24 17:48:48 -04:00

154 lines
4.9 KiB
Python

"""
Query the knowledge base using index-guided retrieval (no RAG).
The LLM reads the index, picks relevant articles, and synthesizes an answer.
No vector database, no embeddings, no chunking - just structured markdown
and an index the LLM can reason over.
Usage:
uv run python query.py "How should I handle auth redirects?"
uv run python query.py "What patterns do I use for API design?" --file-back
"""
from __future__ import annotations
import argparse
import asyncio
import os
from pathlib import Path
from config import KNOWLEDGE_DIR, QA_DIR, now_iso
from utils import load_state, read_wiki_index, save_state
ROOT_DIR = Path(__file__).resolve().parent.parent
# Query model (Sonnet by default — synthesis over the retrieved articles
# benefits from strong reasoning; override via MEMORIA_QUERY_MODEL).
QUERY_MODEL = os.environ.get("MEMORIA_QUERY_MODEL", "sonnet")
async def run_query(question: str, file_back: bool = False) -> str:
"""Query the knowledge base and optionally file the answer back.
Unlike upstream, we do NOT inline the entire wiki into the prompt — the
LLM receives the index only and uses its Read tool to fetch articles
it decides are relevant. Keeps prompt size bounded regardless of
knowledge-base size and avoids the whole-wiki-in-prompt cost wall
documented in upstream issues #3/#5/#9.
"""
from claude_agent_sdk import (
AssistantMessage,
ClaudeAgentOptions,
ResultMessage,
TextBlock,
query,
)
wiki_index = read_wiki_index()
tools = ["Read", "Glob", "Grep"]
if file_back:
tools.extend(["Write", "Edit"])
file_back_instructions = ""
if file_back:
timestamp = now_iso()
file_back_instructions = f"""
## File Back Instructions
After answering, do the following:
1. Create a Q&A article at {QA_DIR}/ with the filename being a slugified version
of the question (e.g., knowledge/qa/how-to-handle-auth-redirects.md)
2. Use the Q&A article format from the schema (frontmatter with title, question,
consulted articles, filed date)
3. Update {KNOWLEDGE_DIR / 'index.md'} with a new row for this Q&A article
4. Append to {KNOWLEDGE_DIR / 'log.md'}:
## [{timestamp}] query (filed) | question summary
- Question: {question}
- Consulted: [[list of articles read]]
- Filed to: [[qa/article-name]]
"""
prompt = f"""You are a knowledge base query engine. Answer the user's question by
consulting the knowledge base.
## How to Answer
1. Read the INDEX section first - it lists every article with a one-line summary
2. Identify 3-10 articles that are relevant to the question
3. Use the Read tool to fetch those articles (they live at
{KNOWLEDGE_DIR}/concepts/, {KNOWLEDGE_DIR}/connections/, and
{KNOWLEDGE_DIR}/qa/). Only read articles you actually need — do not
read the entire wiki.
4. Synthesize a clear, thorough answer
5. Cite your sources using [[wikilinks]] (e.g., [[concepts/supabase-auth]])
6. If the knowledge base doesn't contain relevant information, say so honestly
## Knowledge Base Index
{wiki_index}
## Question
{question}
{file_back_instructions}"""
answer = ""
cost = 0.0
try:
async for message in query(
prompt=prompt,
options=ClaudeAgentOptions(
cwd=str(ROOT_DIR),
model=QUERY_MODEL,
system_prompt={"type": "preset", "preset": "claude_code"},
allowed_tools=tools,
permission_mode="acceptEdits",
max_turns=15,
),
):
if isinstance(message, AssistantMessage):
for block in message.content:
if isinstance(block, TextBlock):
answer += block.text
elif isinstance(message, ResultMessage):
cost = message.total_cost_usd or 0.0
except Exception as e:
answer = f"Error querying knowledge base: {e}"
# Update state
state = load_state()
state["query_count"] = state.get("query_count", 0) + 1
state["total_cost"] = state.get("total_cost", 0.0) + cost
save_state(state)
return answer
def main():
parser = argparse.ArgumentParser(description="Query the personal knowledge base")
parser.add_argument("question", help="The question to ask")
parser.add_argument(
"--file-back",
action="store_true",
help="File the answer back into the knowledge base as a Q&A article",
)
args = parser.parse_args()
print(f"Question: {args.question}")
print(f"File back: {'yes' if args.file_back else 'no'}")
print("-" * 60)
answer = asyncio.run(run_query(args.question, file_back=args.file_back))
print(answer)
if args.file_back:
print("\n" + "-" * 60)
qa_count = len(list(QA_DIR.glob("*.md"))) if QA_DIR.exists() else 0
print(f"Answer filed to knowledge/qa/ ({qa_count} Q&A articles total)")
if __name__ == "__main__":
main()