AAAI First to Pilots AI-Assisted Peer Review, Stirring Global Academia
-
On May 16 the Association for the Advancement of Artificial Intelligence (AAAI) announced that its AAAI‑26 conference will invite large language models (LLMs) into the review pipeline as “supplementary reviewers” and “discussion‑note assistants.” It is the first time a top‑tier AI conference has institutionalised generative AI at scale inside the formal peer‑review chain, heralding a new era of human AI co-review.
Pilot Plan: AI Gets a Seat but Doesn’t Grab the Wheel
-
Two touch‑points
- Extra first‑round review : An LLM will file a parallel review that appears next to the reports from at least two human reviewers.
- Discussion summariser : During reviewer debates the LLM will distil points of agreement and disagreement for the Senior Program Committee.
-
Four bright lines
- No reduction in human‑reviewer head‑count.
- No numerical scoring from the LLM.
- No automated accept/reject decisions.
- Every AI output must pass a human sanity‑check.
-
Official stance
AAAI President Stephen Smith calls the move “a careful, step‑by‑step experiment” designed to augment, not replace, human judgement.
Global Academia: Sweet Efficiency Meets Integrity Jitters
Focus What Enthusiasts Say What Skeptics Worry About Efficiency LLMs can weed out weak submissions and spit out tidy outlines, easing reviewer overload. Over‑reliance may flatten nuance and encourage rubber‑stamp reviews. Quality Early surveys show ~40 % of authors find AI reviews as helpful—if not more. Hallucinations & bias could creep in, parroting author claims or sowing errors. Ethics Private deployments keep data under wraps. No universal rules yet for attribution and confidentiality safeguards. - A March feature in Nature called LLM involvement “irreversible,” yet warned the peer‑review social contract could fray.
- A new Springer poll of interdisciplinary reviewers liked the tidy formatting but flagged “black‑box” bias risks.
- TechCrunch reported some scholars accusing AI startups of “PR hijacking,” urging tougher disclosure rules.
Alarm Bells Ringing
- The Paper spotlighted work from Shanghai Jiao Tong University showing that invisible prompts inside a manuscript can dramatically boost an LLM’s score—opening doors to manipulation, hallucination and prestige bias.
- Zhiyuan Community recapped the ICLR‑2025 AI‑feedback trial: 12,222 suggestions adopted, review quality clearly up—yet organisers kept AI firmly in the feedback‑only lane.
Early Lessons: ICLR’s “Reviewer‑Feedback Bot”
Source: https://cspaper.org/topic/52/your-review-may-have-been-co-authored-by-ai.-iclr-2025
Random LLM feedback was injected into 40 000+ ICLR‑2025 reviews. 26.6 % of reviewers tweaked their write‑ups accordingly. Blind grading found 89 % of those edits improved review quality, while acceptance rates stayed statistically unchanged, encouraging signs for controlled human‑AI co‑evaluation.
Observer Sound‑bite
“With submission numbers growing by double digits every year and reviewers flirting with burnout, AI was bound to pull up a chair. The real challenge is whether transparency, accountability and diverse oversight tag along.”
— Elaine Harris, independent publishing‑ethics scholar
What’s Next: Full Report in Six Months
AAAI will publish a deep‑dive after the conference, covering LLM–human agreement, bias patterns and any sway on accept/reject decisions. Meanwhile, journals and societies are drafting joint frameworks on AI‑use disclosure, data isolation and model‑version locking—aiming to balance the efficiency boom with scholarly integrity.
Heads‑up: Authors and reviewers should watch for AAAI’s July draft of the LLM Reviewer Code of Conduct to stay on top of compliance details.
-