π₯ ICML 2025 Review Results are Coming! Fair or a Total Disaster? π€―
-
If you feel upset, check this paper out
-
Just heard this from a fellow researcher whoβs reviewing for ICML 2025:
"... They keep enforcing mandatory reviews for authors, ... The review process has gotten way too complicated - each paper requires filling out over ten different sections. Itβs already unpaid labor, and now it feels like theyβre squeezing reviewers dry. Honestly, this kind of over-engineered reform is making things worse, not better. Review quality is only going to keep declining if it keeps going this way.β
Yikes. Anyone else feeling or hearing the same?
@lelecao said in
ICML 2025 Review Results are Coming! Fair or a Total Disaster? π€―:
Just heard this from a fellow researcher whoβs reviewing for ICML 2025:
"... They keep enforcing mandatory reviews for authors, ... The review process has gotten way too complicated - each paper requires filling out over ten different sections. Itβs already unpaid labor, and now it feels like theyβre squeezing reviewers dry. Honestly, this kind of over-engineered reform is making things worse, not better. Review quality is only going to keep declining if it keeps going this way.β
Yikes. Anyone else feeling or hearing the same?
Peer review should stay unpaid as I know. Payments may cause great biases and unfairness though people love money including both you and me.
What are other ways for simplification?
-
ICML β also known as I Cannot Manage Life. The worldβs most famous reviewer torture conference, held annually. You submit one paper, review five. Doesnβt matter if itβs not your area β youβll have to figure it out anyway. Comments need to be detailed, long, and exhaustive. Finishing one review basically feels like writing half a paper. No money for reviews, just dedicating all your pure love. And after all those late-night comments? Guess what β the AC might not even consider them, and even a paper with all-positive reviews can still get rejected.
-
ICML β also known as I Cannot Manage Life. The worldβs most famous reviewer torture conference, held annually. You submit one paper, review five. Doesnβt matter if itβs not your area β youβll have to figure it out anyway. Comments need to be detailed, long, and exhaustive. Finishing one review basically feels like writing half a paper. No money for reviews, just dedicating all your pure love. And after all those late-night comments? Guess what β the AC might not even consider them, and even a paper with all-positive reviews can still get rejected.
-
I posted more astonishingly funny reviews here:
https://cspaper.org/topic/26/the-icml-25-review-disaster-what-does-k-in-k-nn-mean
-
334 222 234 124 335
223 344 445 233 122 -
ICML 2025 Sample Paper Scores reported by communities
Paper / Context Scores Notes Theoretical ML paper 4 4 4 3 Former ICLR desk-reject; ICML gave higher scores, hopeful after rebuttal. Attention alternative 3 2 1 2 Lacked compute to run LLM benchmarks as requested by reviewers. GNN Paper #1 2 2 2 2 Reviewer misunderstanding; suggested irrelevant datasets. GNN Paper #2 2 1 1 2 Criticized for not being SOTA despite novelty. Multilingual LLM 1 1 2 3 Biased reviewer compared with own failed method. FlashAttention misunderstanding 1 2 2 3 Reviewer misread implementation; lack of clarity blamed. Rebuttal-acknowledged paper 4 3 2 1 β 4 3 2 2 Reviewer accepted corrected proof. Real-world method w/o benchmarks 3 3 3 2 Reviewer feedback mixed; lacks standard benchmarks. All ones 1 1 1 Author considering giving up; likely reject. Mixed bag (NeurIPS resub) 2 2 1 Reviewer ignored results clearly presented in own section. Exhaustive range 2 3 4 5 βOnly needed a 1 to collect all scores.β Borderline paper (Reddit) 2 3 5 5 Rejections previously; hopeful this time. Balanced but low 3 2 2 2 Reviewer feedback limited; author unsure of chances. Another full range 1 3 5 Author confused by extremes; grateful but puzzled. Extra reviews 1 2 3 3 3 One adjusted score during rebuttal; one reviewer stayed vague. Flat scores 3 3 3 3 Uniformly weak accept, uncertain accept probability. High variance 4 4 3 1 Strong and weak opinions; outcome unclear. Review flagged as LLM-generated 2 1 3 3 LLM tools flagged 2 reviews as possibly AI-generated. Weak accept cluster 3 3 2 Reviewers did not check proofs or supplementary material. Very mixed + LLM suspicion 2 3 4 1 2 Belief that two reviews are unfair / LLM-generated. Lower tail 2 2 1 1 Reviewer comments vague; possible LLM usage suspected. Low-medium range 1 2 3 Concerns reviewers missed paperβs main points. Long tail + unclear review 3 2 2 1 Two willing to adjust; one deeply critical with little justification. Slightly positive 4 3 2 Reviewer praised work but gave 2 anyway. Mixed high 4 2 2 5 Confusing mix, but "5" may pull weight. Middle mix 2 2 4 4 Reviewers disagree on strength; AC may play key role. More reviews than expected 3 3 3 2 2 2 Possibly emergency reviewers assigned. Strong first reviewer 3 2 2 Others gave poor quality reviews; unclear chance. Pessimistic mix 3 2 1 Reviewer willing to increase, but others not constructive. Hopeless mix 1 2 2 3 Reviewer missed key ideas; debating NeurIPS resub. Offline RL 2 2 2 Still decide to rebuttal, but not enough space for additional results Counterfactual exp. 1 2 2 3 Got 7 7 8 8 from ICLR yet still rejected by ICLR2025! This time the scores are ridiculous! -
ICML 2025 Review β Most Outstanding Issues
Sources are labeled whenever suited
1. π§Ύ Incomplete / Low-Quality Reviews
- Several submissions received no reviews at all (Zhihu).
- Single-review papers despite multi-review policy.
- Some reviewers appeared to skim or misunderstand the paper.
- Accusations that reviews were LLM-generated: generic, hallucinated, overly verbose (Reddit).
2.
Unjustified Low Scores
- Reviews lacked substantive critique but gave 1 or 2 scores without explanation.
- Cases where positive commentary was followed by a low score (e.g., "Good paper" + score 2).
- Reviewers pushing personal biases (e.g., βyou didnβt cite my 5 papersβ).
3. π§ Domain Mismatch
- Theoretical reviewers assigned empirical papers and vice versa (Zhihu).
- Reviewers struggling with areas outside their expertise, leading to incorrect comments.
4.
Rebuttal System Frustrations
- 5000-character rebuttal limit per reviewer too short to address all concerns.
- Markdown formatting restrictions (e.g., no multiple boxes, limited links).
- Reviewers acknowledged rebuttal but did not adjust scores.
- Authors felt rebuttal phase was performative rather than impactful.
5. πͺ΅ Bureaucratic Review Process
- Reviewers forced to fill out many structured fields: "claims & evidence", "broader impact", etc.
- Complaint: βToo much form-filling, not enough scienceβ (Zhihu).
6.
Noisy and Arbitrary Scoring
- Extreme score variance within a single paper (e.g., 1/3/5).
- Scores didnβt align with review contents or compared results.
- Unclear thresholds and lack of transparency in AC decision-making.
7.
Suspected LLM Reviews (Reddit-specific)
- Reviewers suspected of using LLMs to generate long, vague reviews.
- Multiple users ran reviews through tools like GPTZero / DeepSeek and got LLM flags.
8.
Burnout and Overload
- Reviewers overloaded with 5 papers, many outside comfort zone.
- No option to reduce load, leading to surface-level reviews.
- Authors and reviewers both expressed mental exhaustion.
9.
Review Mismatch with Paper Goals
- Reviewers asked for experiments outside scope or compute budget (e.g., run LLM baselines).
- Demands for comparisons against outdated or irrelevant benchmarks.
10.
οΈ Lack of Accountability / Transparency
- Authors wished for reviewer identity disclosure post-discussion to encourage accountability.
- Inconsistent handling of rebuttal responses across different ACs and tracks.
-
Even if a rebuttal is detailed and thorough, reviewers often only ACK without changing the score. This usually means they accept your response but donβt feel it shifts their overall assessment enough. Some see added experiments as βtoo lateβ or not part of the original contribution. Others may still not fully understand the paper but wonβt admit it. Unfortunately, rebuttals prevent score drops more often than they raise scores.