Skip to content
  • Categories
  • CSPaper Review
  • Recent
  • Tags
  • Popular
  • World
  • Paper Copilot
  • OpenReview.net
  • Deadlines
  • CSRanking
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
CSPaper

CSPaper: peer review sidekick

  1. Home
  2. Peer Review in Computer Science: good, bad & broken
  3. Artificial intelligence & Machine Learning
  4. ICML 2025 Review Controversies Spark Academic Debate

ICML 2025 Review Controversies Spark Academic Debate

Scheduled Pinned Locked Moved Artificial intelligence & Machine Learning
icmlicml2025icml2025conferenceicml 2025 conferencereviewacademic debatecontroversiesrejectacceptacceptance rate
10 Posts 6 Posters 419 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Joanne
    wrote on 2 May 2025, 20:03 last edited by
    #1

    4096b1e5-1a61-47b3-8caf-338b70eb4f31-image.png
    The ICML 2025 acceptance results have recently been announced, marking a historic high with 12,107 valid submissions, resulting in 3,260 accepted papers—an acceptance rate of 26.9%. Despite the impressive volume, numerous serious issues in the review process have emerged, sparking extensive discussions within the academic community.

    🔥 Highlighted Issues

    1. Inconsistency between review scores and acceptance outcomes
      Haifeng Xu, Professor at the University of Chicago, observed that review scores at ICML 2025 were oddly disconnected from acceptance outcomes. Of his four submissions, the paper with the lowest average score (2.75) was accepted as a poster, while the three papers with higher scores (3.0) were rejected.
      52270782-e939-43aa-ae52-75da7359f1fc-image.png
    2. Positive reviews yet inexplicable rejection
      A researcher from KAUST reported that his submission received uniformly positive reviews, clearly affirming its theoretical and empirical contributions, yet it was rejected without any negative feedback or explanation.
      207d3a85-dd66-4881-9741-73e2abc33178-image.png
    3. Errors in review-score documentation
      Zhiqiang Shen, Assistant Professor at MBZUAI, highlighted significant recording errors. One paper, clearly rated with two "4" scores, was mistakenly documented in the meta-review as having "three 3’s and one 4". Another paper suffered rejection based on outdated reviewer comments, ignoring the updated scores from reviewers during the rebuttal period.
      c7cc2185-3322-495d-9ee3-b48e9f8f8986-image.png
    4. Unjustified rejection by Area Chair
      Mengmi Zhang, Assistant Professor at NTU, experienced a perplexing case where her paper was rejected by the Area Chair despite unanimous approval from all reviewers, with no rationale provided.
      cddb2a5e-e3eb-4da0-8481-6b75fe940b5f-image.png
    5. Incomplete review submissions
      A doctoral student from York University reported incomplete reviews were submitted for his paper, yet the Area Chair cited these incomplete reviews as justification for rejection.
      9af341fb-4c04-422a-83c9-8df422586408-image.png
    6. Zero-sum game and unfair review criteria
      A reviewer from UT publicly criticized the reviewing criteria, lamenting overly lenient reviews in the past. He highlighted a troubling trend: submissions not employing at least 30 trillion tokens to train 671B MoE models risk rejection regardless of their theoretical strength.
      10393efa-95b3-4547-9662-f2d57978e66d-image.png

    Additionally, several researchers noted suspiciously AI-generated or carelessly copy-pasted reviews, causing contradictory feedback.

    🎉 Notable Achievements

    Despite these controversies, several research groups achieved remarkable outcomes among others:

    • Duke University (Prof. Yiran Chen’s team): 5 papers accepted, including 1 spotlight poster.
    • Peking University (Prof. Ming Zhang’s team): 4 papers accepted for the second consecutive year.
    • UC Berkeley (Dr. Xuandong Zhao): 3 papers accepted.

    💡 Open Discussion

    Given these significant reviewing issues—including reviewer negligence, procedural chaos, and immature AI-assisted review systems—how should top-tier academic conferences reform their processes to ensure fairness and enhance review quality?

    We invite everyone to share your thoughts, experiences, and constructive suggestions!

    1 Reply Last reply
    1
    • R root shared this topic on 3 May 2025, 21:27
    • J Offline
      J Offline
      Joserffrey
      Super Users
      wrote on 4 May 2025, 03:01 last edited by
      #2

      It seems that reviewers do not have permission to view the ACs' meta-review and PCs' final decision this year. As a reviewer, I cannot see results of the submissions I reviewed.

      1 Reply Last reply
      1
      • C Offline
        C Offline
        cqsyf
        Super Users
        wrote on 7 May 2025, 19:33 last edited by
        #3

        My colleague is serving as a Program Committee (PC) member for this year’s ICML. According to her, some individuals were selected as reviewers solely based on having co-authored a previous ICML paper. Upon investigating the backgrounds of certain reviewers who appeared to submit problematic reviews, she discovered that many of them lacked even a bachelor’s degree; for instance, some were first-year undergraduate students 😨 😕 😯

        J 1 Reply Last reply 7 May 2025, 22:57
        2
        • C cqsyf
          7 May 2025, 19:33

          My colleague is serving as a Program Committee (PC) member for this year’s ICML. According to her, some individuals were selected as reviewers solely based on having co-authored a previous ICML paper. Upon investigating the backgrounds of certain reviewers who appeared to submit problematic reviews, she discovered that many of them lacked even a bachelor’s degree; for instance, some were first-year undergraduate students 😨 😕 😯

          J Offline
          J Offline
          Joserffrey
          Super Users
          wrote on 7 May 2025, 22:57 last edited by
          #4

          @cqsyf Perhaps we should prepare ourselves mentally for this to become the norm. AFAIK, NIPS'25 already has PhD students as ACs, and undergraduate reviewers are even more common as reviewers. This is really terrible.

          1 Reply Last reply
          1
          • C Offline
            C Offline
            cocktailfreedom
            Super Users
            wrote on 8 May 2025, 21:43 last edited by
            #5

            With such a pace of submission increase year-o-year, I can not think of a way how this manual review effort may work well!!

            1 Reply Last reply
            1
            • R Online
              R Online
              root
              wrote on 8 May 2025, 21:51 last edited by root 17 days from now
              #6

              This thread vividly highlights what seems to be an ironic paradox in the academic community: the more papers we submit, the less time we have left to properly review them!

              Think about it, researchers are now spending countless hours crafting submissions to reach record-breaking numbers at conferences like ICML 2025. Yet, this surge in submissions might be directly correlated with declining review quality. It's like we're baking thousands of cakes and then complaining that no one has time to taste them properly. 🍰 🤠

              Perhaps we’re witnessing a "submission-reviewer paradox": the energy invested in authoring more papers inevitably leaves us with fewer resources for thorough and careful reviewing.

              Could the solution be smarter automation, stricter reviewer qualifications, or maybe even rethinking how conferences handle volume altogether ❓

              1 Reply Last reply
              2
              • J Offline
                J Offline
                Joanne
                wrote on 9 May 2025, 16:56 last edited by
                #7

                Seriously, "first-year undergraduate students" as reviewer?!

                1 Reply Last reply
                1
                • J Offline
                  J Offline
                  Joanne
                  wrote on 9 May 2025, 17:04 last edited by
                  #8

                  EMNLP submissions could skyrocket past 10,000 this year. The speed of this growth is astonishing, reflecting just how rapidly the field is expanding. These top-tier conferences attract the best authors and have the privilege to have the most capable reviewers. Hopefully, this won’t discourage authors.

                  1 Reply Last reply
                  1
                  • 2 months later
                  • SylviaS Offline
                    SylviaS Offline
                    Sylvia
                    Super Users
                    wrote 8 days ago last edited by
                    #9

                    I put together a structured overview of all 120 oral papers accepted at ICML 2025, categorized by research topic. The summary is aimed at the CS research review community and highlights trends, innovations, and open questions in the field.


                    1. Foundation Models, LLMs, & Multimodal AI

                    • Layer by Layer: Uncovering Hidden Representations in Language Models
                      Explores the structure and semantics of intermediate representations in large language models.

                    • Learning Dynamics in Continual Pre-Training for Large Language Models
                      Studies how continual pre-training affects the learning dynamics and knowledge retention of LLMs.

                    • Emergent Misalignment: Narrow Finetuning can Produce Broadly Misaligned LLMs
                      Shows that targeted finetuning may create unexpected, broad misalignment issues in LLMs.

                    • CollabLLM: From Passive Responders to Active Collaborators
                      Proposes techniques for LLMs to act as proactive, context-aware collaborators.

                    • AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models
                      Introduces a dataset and evaluation suite for emotion understanding in multimodal LLMs.

                    • On Path to Multimodal Generalist: General-Level and General-Bench
                      Presents a unified framework and benchmarks for developing generalist multimodal AI models.

                    • EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
                      Proposes a suite for benchmarking multimodal models in embodied agent tasks.

                    • SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
                      Scalable synthetic data pipeline for visual question answering with multimodal LLMs.

                    • Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
                      Benchmarks multimodal reasoning skills in LLMs.

                    • VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
                      Uses synthetic reasoning to learn reward models for multi-domain processes.

                    • Sundial: A Family of Highly Capable Time Series Foundation Models
                      Introduces a family of foundation models for time series data.

                    • Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG
                      Combines retrieval-augmented generation with high-res visual perception.

                    • What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities
                      Introduces a broad benchmark for virtual agents’ key capabilities.


                    2. Representation Learning & Theory

                    • An analytic theory of creativity in convolutional diffusion models
                      Develops a mechanistic, interpretable theory of creativity in diffusion models.

                    • Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
                      Theorizes about the mechanisms behind in-context learning in large models.

                    • Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
                      Discovers universal scaling laws in optimally trained neural networks.

                    • Transformative or Conservative? Conservation laws for ResNets and Transformers
                      Connects conservation laws to deep architectures.

                    • Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
                      Explores “grokking” in non-neural computational models.

                    • Learning with Expected Signatures: Theory and Applications
                      Presents a new mathematical framework for sequential data representations.

                    • General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization
                      Theoretical advances in stochastic optimization.

                    • Equivalence is All: A Unified View for Self-supervised Graph Learning
                      Unifies self-supervised objectives in graph learning under an equivalence framework.

                    • Blink of an eye: a simple theory for feature localization in generative models
                      Theoretical work on feature localization.

                    • Expected Variational Inequalities
                      Introduces variational inequalities in expectation as a new analytical tool.

                    • Partition First, Embed Later: Laplacian-Based Feature Partitioning for Refined Embedding and Visualization of High-Dimensional Data
                      Laplacian-based methods for dimensionality reduction.

                    • Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes
                      New models for dynamic brain network analysis.


                    3. Diffusion, Generative Models & Creativity

                    • VideoRoPE: What Makes for Good Video Rotary Position Embedding?
                      Advances rotary position embeddings for video modeling.

                    • ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
                      Shows how diffusion transformers develop interpretable features.

                    • MGD³: Mode-Guided Dataset Distillation using Diffusion Models
                      Applies diffusion models to dataset distillation.

                    • DeFoG: Discrete Flow Matching for Graph Generation
                      Diffusion-based approaches for graph generative models.

                    • Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
                      Explores token ordering effects in diffusion-based text generation.

                    • Normalizing Flows are Capable Generative Models
                      Revisits normalizing flows for scalable generative modeling.

                    • Score Matching with Missing Data
                      Score-based generative models for incomplete data.


                    4. Optimization, Theory & Algorithms

                    • Algorithm Development in Neural Networks: Insights from the Streaming Parity Task
                      Theoretical analysis of algorithmic problem-solving in neural nets.

                    • An Online Adaptive Sampling Algorithm for Stochastic Difference-of-convex Optimization with Time-varying Distributions
                      Novel optimization algorithms for dynamic settings.

                    • Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness
                      Advances in preconditioned optimization.

                    • Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton
                      Analyses the bias in random matrix inversion.

                    • One-Step Generalization Ratio Guided Optimization for Domain Generalization
                      Introduces a new optimization criterion for domain generalization.

                    • Polynomial-Delay MAG Listing with Novel Locally Complete Orientation Rules
                      Graph-theoretic algorithms.

                    • An Improved Clique-Picking Algorithm for Counting Markov Equivalent DAGs via Super Cliques Transfer
                      Faster algorithms for counting Markov equivalence classes.

                    • Near-Optimal Decision Trees in a SPLIT Second
                      Develops new algorithms for fast, near-optimal decision tree learning.

                    • A Generalization Result for Convergence in Learning-to-Optimize
                      Generalization bounds in meta-optimization.

                    • LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won't Fail)
                      Theoretical convergence results for LoRA.

                    • LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
                      Single-step gradient fine-tuning for LLMs.

                    • Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent
                      Implicit regularization in tensor methods.


                    5. Reinforcement Learning, Agents & Decision Making

                    • Multi-agent Architecture Search via Agentic Supernet
                      Automated design of multi-agent systems.

                    • Training a Generally Curious Agent
                      Advances in curiosity-driven exploration.

                    • Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration
                      Methods for safer RL via bias correction.

                    • Temporal Difference Flows
                      Temporal difference learning with flow-based methods.

                    • Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
                      Sparse networks for scalable RL.

                    • Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination
                      Transferable cooperation in multi-agent RL.

                    • VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
                      Synthetic reasoning for complex RL reward modeling.

                    • High-Dimensional Prediction for Sequential Decision Making
                      Learning for high-dimensional decision-making tasks.


                    6. Robustness, Safety, Privacy & Security

                    • Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
                      Detects AI-generated images using robust subspace methods.

                    • Position: Certified Robustness Does Not (Yet) Imply Model Security
                      Argues the gap between robustness guarantees and practical security.

                    • Adversarial Inception Backdoor Attacks against Reinforcement Learning
                      Examines vulnerabilities in RL to backdoor attacks.

                    • AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
                      Benchmarks for adversarial example defenses.

                    • Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings
                      Provides new tools for deployment-oriented classifier evaluation.

                    • Auditing f-differential privacy in one run
                      Practical privacy auditing for learning algorithms.

                    • On Differential Privacy for Adaptively Solving Search Problems via Sketching
                      Differential privacy in adaptive search.

                    • Going Deeper into Locally Differentially Private Graph Neural Networks
                      Privacy-preserving learning on graphs.


                    7. Causality, Generalization & Explainability

                    • Position: Not All Explanations for Deep Learning Phenomena Are Equally Valuable
                      Calls for careful evaluation of explanation quality.

                    • Sanity Checking Causal Representation Learning on a Simple Real-World System
                      Evaluates causal representation learning with real data.

                    • Statistical Test for Feature Selection Pipelines by Selective Inference
                      Selective inference in feature selection.

                    • A Generalization Theory for Zero-Shot Prediction
                      New theory for zero-shot generalization.

                    • Statistical Collusion by Collectives on Learning Platforms
                      Examines collective manipulation in ML platforms.

                    • Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning
                      Tackles drift in continual learning scenarios.

                    • Generalization Result for Convergence in Learning-to-Optimize
                      Generalization in meta-learning.


                    8. Scientific Discovery, Mathematics & Symbolic Reasoning

                    • LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
                      Benchmarks scientific equation discovery with LLMs.

                    • Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?
                      ML for conjecturing in mathematics.

                    • Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics
                      Datasets for symbolic mathematical discovery.

                    • From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control
                      Memory-efficient fine-tuning methods for LLMs.


                    9. Vision, Video, Perception & Multimodal

                    • ReferSplat: Referring Segmentation in 3D Gaussian Splatting
                      Novel approach for referring segmentation in 3D.

                    • VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
                      Integrates appearance and motion for video generation.

                    • VideoRoPE: What Makes for Good Video Rotary Position Embedding?
                      (Duplicate with above, retained for emphasis on video modeling.)


                    10. Data, Scaling Laws & Evaluation

                    • Improving the Scaling Laws of Synthetic Data with Deliberate Practice
                      Deliberate practice for synthetic data generation.

                    • Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection
                      Multi-model approaches for subset selection.

                    • Mixture of Lookup Experts
                      Scalable expert mixture models.

                    • Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models
                      Analyzes and identifies bad training samples.

                    • Inductive Moment Matching
                      Moment matching for robust model learning.

                    • Prices, Bids, Values: One ML-Powered Combinatorial Auction to Rule Them All
                      ML for combinatorial auctions.


                    11. Policy, Society, and Position Papers

                    • Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
                      Calls for reforms in peer review.

                    • Position: Probabilistic Modelling is Sufficient for Causal Inference
                      Argues for the adequacy of probabilistic modeling for causal inference.

                    • Position: Generative AI Regulation Can Learn from Social Media Regulation
                      Draws parallels between AI and social media regulation.

                    • Position: Current Model Licensing Practices are Dragging Us into a Quagmire of Legal Noncompliance
                      Highlights legal risks in model licensing.

                    • Position: AI Agents Need Authenticated Delegation
                      Argues for delegation mechanisms in AI agents.

                    • Position: AI Safety should prioritize the Future of Work
                      Suggests work-focused priorities for AI safety.

                    • Position: Medical Large Language Model Benchmarks Should Prioritize Construct Validity
                      Pushes for rigorous benchmarking in medical AI.

                    • Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
                      Empirical evaluation via competitions.

                    • Position: Principles of Animal Cognition to Improve LLM Evaluations
                      Inspiration from animal cognition for evaluation.

                    • Position: Political Neutrality in AI Is Impossible — But Here Is How to Approximate It
                      Discusses challenges and solutions for political neutrality in AI.


                    12. Miscellaneous: Specialized Models & Systems

                    • Rényi Neural Processes
                      Probabilistic neural processes with Rényi divergences.

                    • The dark side of the forces: assessing non-conservative force models for atomistic machine learning
                      Physics-inspired ML models.

                    • AutoGFM: Automated Graph Foundation Model with Adaptive Architecture Customization
                      Foundation models for graphs.

                    • ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
                      IT automation evaluation suite.

                    • STAIR: Improving Safety Alignment with Introspective Reasoning
                      Safety alignment via introspective reasoning.

                    • Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings
                      Deployment-oriented classifier evaluation.


                    This summary omits author details for brevity and focuses solely on research content and topics.

                    1 Reply Last reply
                    0
                    • R Online
                      R Online
                      root
                      wrote 5 days ago last edited by
                      #10

                      Here's a glimpse of some truly remarkable work recognized this year:

                      🏆 Outstanding Papers:

                      • Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
                        Vaishnavh Nagarajan, Chen Wu, Charles Ding, Aditi Raghunathan

                      • The Value of Prediction in Identifying the Worst-Off
                        Unai Fischer Abaigar, Christoph Kern, Juan Perdomo

                      • Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
                        Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, Sitan Chen

                      • Score Matching with Missing Data
                        Josh Givens, Song Liu, Henry Reeve

                      • CollabLLM: From Passive Responders to Active Collaborators
                        Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao

                      • Conformal Prediction as Bayesian Quadrature
                        Jake Snell, Thomas Griffiths

                      🔖 Outstanding Position Papers:

                      • AI Safety should prioritize the Future of Work
                        Sanchaita Hazra, Bodhisattwa Prasad Majumder, Tuhin Chakrabarty

                      • The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
                        Jaeho Kim, Yunseok Lee, Seulki Lee

                      👉 See all awards and details here

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes

                      1/10

                      2 May 2025, 20:03


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      © 2025 CSPaper.org Sidekick of Peer Reviews
                      Debating the highs and lows of peer review in computer science.
                      1 / 1
                      • First post
                        1/10
                        Last post
                      0
                      • Categories
                      • CSPaper Review
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Paper Copilot
                      • OpenReview.net
                      • Deadlines
                      • CSRanking