Skip to content

Artificial intelligence & Machine Learning

Discuss peer review challenges in AI/ML research — submission, review quality, bias, and decision appeals at ICLR, ICML, NeurIPS, AAAI, IJCAI, AISTATS and COLT.

This category can be followed from the open social web via the handle ai-ml@cspaper.org:443

12 Topics 35 Posts
  • 2 Votes
    16 Posts
    1k Views
    SylviaS
    ICML 2025 Review – Most Outstanding Issues Sources are labeled whenever suited 1. 🧾 Incomplete / Low-Quality Reviews Several submissions received no reviews at all (Zhihu). Single-review papers despite multi-review policy. Some reviewers appeared to skim or misunderstand the paper. Accusations that reviews were LLM-generated: generic, hallucinated, overly verbose (Reddit). 2. Unjustified Low Scores Reviews lacked substantive critique but gave 1 or 2 scores without explanation. Cases where positive commentary was followed by a low score (e.g., "Good paper" + score 2). Reviewers pushing personal biases (e.g., “you didn’t cite my 5 papers”). 3. 🧠 Domain Mismatch Theoretical reviewers assigned empirical papers and vice versa (Zhihu). Reviewers struggling with areas outside their expertise, leading to incorrect comments. 4. Rebuttal System Frustrations 5000-character rebuttal limit per reviewer too short to address all concerns. Markdown formatting restrictions (e.g., no multiple boxes, limited links). Reviewers acknowledged rebuttal but did not adjust scores. Authors felt rebuttal phase was performative rather than impactful. 5. 🪵 Bureaucratic Review Process Reviewers forced to fill out many structured fields: "claims & evidence", "broader impact", etc. Complaint: “Too much form-filling, not enough science” (Zhihu). 6. Noisy and Arbitrary Scoring Extreme score variance within a single paper (e.g., 1/3/5). Scores didn’t align with review contents or compared results. Unclear thresholds and lack of transparency in AC decision-making. 7. Suspected LLM Reviews (Reddit-specific) Reviewers suspected of using LLMs to generate long, vague reviews. Multiple users ran reviews through tools like GPTZero / DeepSeek and got LLM flags. 8. Burnout and Overload Reviewers overloaded with 5 papers, many outside comfort zone. No option to reduce load, leading to surface-level reviews. Authors and reviewers both expressed mental exhaustion. 9. Review Mismatch with Paper Goals Reviewers asked for experiments outside scope or compute budget (e.g., run LLM baselines). Demands for comparisons against outdated or irrelevant benchmarks. 10. ️ Lack of Accountability / Transparency Authors wished for reviewer identity disclosure post-discussion to encourage accountability. Inconsistent handling of rebuttal responses across different ACs and tracks.
  • 0 Votes
    1 Posts
    9 Views
    No one has replied
  • The ICML'25 Review Disaster: "What Does 'k' in k-NN Mean?" 😱

    1
    0 Votes
    1 Posts
    26 Views
    No one has replied
  • 0 Votes
    1 Posts
    36 Views
    No one has replied
  • ICLR 2025 Asks Authors: “How Were Your Reviews?”

    iclr 2025 feedback
    1
    1
    0 Votes
    1 Posts
    64 Views
    No one has replied
  • 0 Votes
    3 Posts
    42 Views
    L
    The ICBINB workshop webpage had a section about this - "AI-Generated Papers" in the reviewer guidelines https://sites.google.com/view/icbinb-2025/reviewer-guidelines [image: 1742515641656-screenshot-2025-03-21-at-01.07.09.png]
  • The Rejection of The Mamba Paper at ICLR 2024: What Happened?

    iclr retro 2024
    3
    0 Votes
    3 Posts
    121 Views
    cqsyfC
    And ... Mamba2 was accepted by ICML2024 apparently
  • 1 Votes
    3 Posts
    69 Views
    rootR
    @xiaolong yeah, right? Hopefully we do not hear such news in 2025 and onwards…
  • 1 Votes
    1 Posts
    31 Views
    No one has replied
  • 0 Votes
    3 Posts
    65 Views
    L
    Zochi's achievement is NOT the first AI-driven scientific research platform. Last year, Llion Jones, one of the original creators behind the Transformer architecture, founded Sakana AI and launched an automated research platform straightforwardly named "AI Scientist", which has already evolved to its 2nd generation. Interestingly, a paper produced by AI Scientist v2 also passed peer review at this year's ICLR workshop on ICBINB, receiving scores of 6/7/6. However, it's important to consider that workshop acceptance criteria typically differ from those of the main ICLR conference, with acceptance rates for workshops being roughly two to three times higher. Despite their acceptance, controversy around AI-driven research persists. Even successful AI-generated papers face the risk of being withdrawn before formal publication due to ongoing academic debate. For example, Intology (the creators behind Zochi) acknowledged that "AI should not be credited as an author in academic work and are currently discussing with workshop organizers whether and how these results should be presented to the research community". Furthermore, according to internal assessments by Sakana using main-conference-level standards, the AI Scientist-v2 paper failed to meet acceptance criteria. This aligns with Intology’s own NeurIPS-based automated evaluation, which gave AI Scientist-v2 an average score below 4 that is actually worse than its predecessor. [image: 1742380427079-screenshot-2025-03-19-at-11.33.19.png] Zochi's performance clearly outshines that of AI Scientist-v2, yet whether its research would succeed at the main conference level remains to be seen, I believe. Due to ongoing controversies surrounding AI-driven research within academia, even if accepted, research teams might withdraw their papers before formal publication. Intology has explicitly stated that, "in the interest of preserving academic integrity, they agree AI should not be listed as an author on scholarly works". Currently, they are in discussions with workshop organizers to determine whether these AI-generated findings should be presented publicly.
  • 0 Votes
    1 Posts
    24 Views
    No one has replied
  • 0 Votes
    1 Posts
    32 Views
    No one has replied