Docs MathLab Methodology

Corpus System & Document Retrieval

June 16, 20262 min read

What the corpus is

The corpus is a document retrieval system integrated into the War Room. It acts as a research library that the AI can access during your session. When you load a document into the corpus, the AI can search it, quote from it, and reference it in its analysis.

This solves a fundamental limitation of AI-assisted research: without a corpus, the AI only knows what was in its training data. With a corpus, the AI knows your specific research context — your prior results, your notation, your definitions, the specific papers you are building on.

What to put in the corpus

The corpus is most valuable when loaded with:

  • Prior session results — if you proved a lemma in Session 1, load it into the corpus for Session 2. The AI will know about it and can build on it.
  • Reference papers — papers you are citing, extending, or responding to. The AI can check its reasoning against the source material.
  • Definitions and notation — your specific notation conventions, nonstandard definitions, or domain-specific terminology. This prevents the AI from using different conventions that confuse the analysis.
  • Partial results — work in progress that needs verification, extension, or critique.

How retrieval works

When the AI needs to reference corpus material during a session, it searches the loaded documents by relevance to the current query. The retrieval is semantic — it finds passages that are conceptually relevant, not just keyword matches.

The AI cites its corpus sources explicitly, so you can verify that it is referencing the material correctly. If it misinterprets a loaded document, you can correct it in the next round.

Corpus scope

The corpus is scoped to the current session and the documents you explicitly load into it. It does not automatically include documents from previous sessions unless you load them. This is intentional — it keeps the retrieval focused and prevents old, potentially outdated material from influencing current analysis without your knowledge.

Building a research program

The corpus system enables multi-session research programs:

Session 1: Establish definitions and prove foundational lemmas. Session 2: Load Session 1 results into corpus. Extend to main theorem. Session 3: Load Sessions 1 + 2 results. Test edge cases and counterexamples. Session 4: Load all prior results. Write up formal proof.

Each session builds on the last, with the corpus providing the continuity thread. This mirrors how human research programs work — you do not re-derive your foundations every time you sit down.

MG
Matthew J. Goss, Jr.
Retired COMEX/NYMEX floor trader, Goldman Sachs and FlexTrade Systems alumnus, multi-instrumentalist, published author, and independent mathematics researcher. Founder of Quantiterate.