Abstract
Self-improving agents are often framed as recursive systems that discover their own improvement procedures. For high-stakes vertical NLP systems, however, the bottleneck is often not autonomy but the placement of expert knowledge: domain specialists can supply rubrics, failure modes, calibration examples, and deployment constraints that an open search loop would otherwise need to rediscover. We present SIRA (self-improving review agent), an expert-bootstrapped agent factory for scientific peer-review support. SIRA keeps the online reviewer and common execution harness fixed, while offline iterations edit only venue-specific artifacts: rubrics, metadata, prompts, templates, calibration rules, benchmark packs, and failure analyses. On a paper-review agent-creation task, SIRA achieves a mean best held-out decision-label accuracy of 0.941 over five runs, compared with 0.865 for a HyperAgents-style open editable-agent baseline under the same dataset split and metric; it also reaches its best candidate in roughly one third as many scored steps. The claim is bounded but sharp: in peer-review support, self-improvement can be strongest when experts shape the search space first and iteration is restricted to auditable, versioned factory artifacts.
Keywords
Citation
@article{Cao2026Experts,
title={Experts First, Iteration Second: Auditable Self-Improvement for Scientific Peer-Review Agents},
author={Lele Cao and Xin Huang and Lei You},
year={2026},
url={https://cspaper.org/openprint/20260601.0001v1},
journal={OpenPrint:20260601.0001v1}
}Version History
| Version | Released Date | Submitter |
|---|---|---|
v1Current | Jun 1, 2026 | Lele Cao |
