How Does the Evaluation Work?
Each evaluation cycle administers 100 questions to multiple AI models. Models answer independently — they can't see each other's responses. Each response is scored on the rubric dimensions. Individual scores are aggregated into per-model totals, then cross-model consensus is computed. The process is fully automated and repeatable — same questions, same rubric, consistent methodology across cycles.