Conducting Mathematical Research with AI

June 16, 20263 min read

The right mental model

The most common mistake in AI-assisted mathematics is treating the AI as an oracle — asking it to solve your problem and accepting whatever it says. This produces unreliable results because AI models can be confidently wrong about mathematics.

The better mental model: the AI is a very fast, very knowledgeable research assistant with no judgment about its own reliability. It can try approaches you have not thought of, perform tedious computations quickly, and explore many branches of a proof tree simultaneously. But you are responsible for verifying its work.

Formulating problems effectively

The quality of AI mathematical output depends heavily on input quality. Effective problem formulation:

Be precise about definitions. If you are using nonstandard notation or definitions, state them explicitly. Do not assume the AI knows your convention for epsilon-delta limits or your particular definition of a topological space.

State what you have already tried. This prevents the AI from spending a round on an approach you have already explored and found wanting.

Specify what you want. Prove this theorem and find a counterexample to this conjecture are different requests with different optimal strategies. Be explicit about what success looks like.

Provide context. If the result depends on a prior lemma, include the lemma statement and proof (or load it into the War Room corpus).

Structuring a session

A well-structured 10-round MathLab session:

Round 1: State the problem completely, with all definitions, prior results, and context. Ask the AI to confirm its understanding before proceeding.

Rounds 2-4: Explore approaches. Ask the AI to outline 2-3 possible proof strategies, then select the most promising one to develop.

Rounds 5-8: Develop the proof step by step. Each round should advance one major step. Verify each step before building on it.

Round 9: Review the complete argument. Ask the AI to check for gaps, unstated assumptions, or circular reasoning.

Round 10: Clean up. Ask for a formal write-up of the result, or identify what is still missing for the next session.

Handling errors

AI mathematical errors typically fall into three categories:

Computational errors — arithmetic or algebraic mistakes. These are usually easy to catch by verifying specific steps.

Logical errors — invalid inferences, unstated assumptions, or circular reasoning. These are harder to catch and more dangerous. Always ask the AI to justify each logical step explicitly.

Conceptual errors — misunderstanding the problem, misapplying a theorem, or confusing similar concepts. These require domain expertise to detect. If a result feels too easy or too clean, scrutinize it.

When you catch an error, do not just ask the AI to fix it. Explain what went wrong and why, then ask for a corrected approach. This produces better results than vague correction requests.

Building multi-session programs

For research that spans multiple sessions:

End each session with a clear summary of what was established
Save the summary as a document
Load it into the War Room corpus for the next session
Start the next session by having the AI review the loaded prior results before proceeding

This creates a research thread that builds systematically, with each session extending the last rather than starting from scratch.

Matthew J. Goss, Jr.

Retired COMEX/NYMEX floor trader, Goldman Sachs and FlexTrade Systems alumnus, multi-instrumentalist, published author, and independent mathematics researcher. Founder of Quantiterate.