-
Peer Review in Computer Science: good, bad & broken
Discuss everything about peer review in computer science research: its successes, failures, and the challenges in between.
-
Artificial intelligence & Machine Learning
Discuss peer review challenges in AI/ML research — submission, review quality, bias, and decision appeals at ICLR, ICML, NeurIPS, AAAI, IJCAI, AISTATS and COLT.
-
Computer Vision, Graphics & Robotics
Discuss peer review challenges, submission experiences, decision fairness, reviewer quality, and biases at CVPR, ICCV, ECCV, VR, SIGGRAPH, EUROGRAPHICS, ICRA, IROS, RSS etc.
-
Natural Language Processing
Discuss peer review, submission experiences, and decision challenges for NLP research at ACL, EMNLP, NAACL, and COLING.
-
-
Software Engineering & System Software
ICSE, OSDI, SOSP, POPL, PLDI, FSE/ESEC, ISSTA, OOPSLA and ASE
-
-
Web, Multimedia & Information Retrieval
Anything around peer review for conferences such as SIGIR, WWW, ICMR, ICME, ECIR, ICASSP and ACM MM.
-
Anonymous Sharing & Supplementary Materials
Anonymously share data, results, or materials. Useful for rebuttals, blind submissions and more. Only unverified users can post (and edit or delete anytime afterwards).
Just noticed that ICML 2025 has taken a small but meaningful step toward OpenReview: the reviews of accepted papers will eventually be made public. While this isn't full-fledged open review yet, it's a clear signal that change is coming.
As a reviewer myself, I felt overwhelmed by the sheer volume of submissions this year. Unfortunately, I felt a noticeable drop in quality. Some papers were clearly submitted in a "let's try our luck" fashion. In this context, I sincerely hope that top AI/ML conferences will eventually follow ICLR's model and adopt fully open peer review.
Why Open Review Matters
For Reviewers: Knowing that reviews will be public adds a layer of accountability. It encourages more thoughtful, constructive, and responsible feedback. No more careless 1-scores or copy-pasted comments.
For Authors: When reviews are public, authors will think twice before submitting undercooked ideas. Fear of negative reviews being visible online can act as a natural filter to avoid "lottery-style" submissions.
For the Community: Public reviews help newcomers learn how to write better papers and better reviews. It also reduces the burden on reviewers caused by the "Fibonacci submission strategy" (endless revise-and-resubmits across top venues), and ultimately improves the quality of accepted papers.
Final Thoughts
Open review isn't a silver bullet, but in this era of exploding submission numbers, it’s a change worth pursuing. I hope to see more top-tier conferences move toward transparent and accountable reviewing, bringing the focus back to research quality, not just acceptance rates.
We’re excited to introduce a new category on cspaper.org:
Anonymous Sharing & Supplementary Materials
Purpose
This category is designed specifically for anonymized sharing of supplementary materials such as:
Additional experimental results or figures
Extended ablation studies
Links to datasets or demos
Supplementary explanations that didn’t fit in the main paper
It’s especially useful during rebuttals, when you may need to share extra content with reviewers but remain compliant with anonymity and strict page limits.
How It Works
Only unverified (but registered) users can post in this category: make sure to skip filling your email during registration.
You can edit or delete your post anytime
Use a username that doesn’t reveal your identity
Share the link to your anonymous supplementary materials post/topic
This keeps everything in line with double-blind peer review policies while giving you a legitimate way to share supporting materials.
️ Stay Anonymous
Please do not use real names or affiliations when posting.
If you're unsure how to create an anonymous username, try one of these generators:
Jimpix Random Username Generator
UsernameGenerator.com
NordPass Username Generator
Dashlane Username Generator
Example Use Case
You're writing the rebuttal for your paper submitted and reviewed by NeurIPS/ICML/ACL.
You need to trim down your rebuttal to fit the rebuttal length limit. But, you want to share detailed results or code (causing violation of rebuttal length) with reviewers during the rebuttal.
Just post them anonymously in that category and include the link in your response!
Pro Tip
you can make your url even shorter by leveraging services like TinyURL and Bitly.
Start Posting
Visit the category here and share your materials:
https://cspaper.org/category/10/anonymous-sharing-supplementary-materials
Nature’s latest article dives into a topic that’s both timely and controversial for the Computer Science research community:
How Artificial Intelligence is Changing Peer Review
[image: d41586-025-00894-7_50775772.jpg]
Why CS Researchers Should Care
The peer review process is foundational to how we validate and publish research in venues like NeurIPS, ICML, CVPR, and ACL. Now, AI tools — including LLMs — are being used to:
Detect issues in code, datasets, and statistics
Polish academic writing and reviewer comments
Generate structured reviewer feedback
As developers and users of these technologies, we’re not just spectators — we’re stakeholders in how AI shapes peer review.
🧠 Highlights from the Article
Automation meets evaluation: Editors are using AI to support reviewers, flag ethical issues, and accelerate turnaround times.
Rising concerns:
Dilution of human judgment and domain expertise
Risk of leaking confidential submissions to third-party AI tools
Over-reliance on black-box systems in scholarly assessment
Calls for transparency: Journals are starting to draft policies, but the CS community still operates largely in a gray area.
Open Questions for CS Conferences
As AI becomes embedded in our research workflows, what happens to the review process itself?
Should CS conferences allow or restrict AI-generated reviews?
How do we audit the use of AI tools in paper writing and reviewing?
Is it time to develop community guidelines for LLM usage in peer review?
Join the Discussion
Have you seen AI-assisted reviews in action? Used ChatGPT to clarify your own review comments?
Let’s talk:
🧭 What role should AI play in the future of peer review?
️ How do we protect the integrity of double-blind reviewing?
️ Can we strike a balance between efficiency and ethics?
Share your experiences and thoughts below.
Source: Nature — How Artificial Intelligence is Changing Peer Review, March 27, 2025
Originally posted on Bluesky by Serge Belongie (@serge.belongie.com), this timely thread opens a conversation about decentralizing paper presentations at NeurIPS.
🧭 The Poll That Sparked a Discussion
Serge Belongie (University of Copenhagen) and Søren Hauberg (DTU) are exploring a simple but profound question:
Would you present your next NeurIPS paper in Europe instead of traveling to San Diego (US), if this was an option?
They’re collecting data to gauge interest in offering official NeurIPS paper presentations at a European location, without requiring long-haul travel to the US.
Link to participate in the poll (Google Form)
🧠 Why This Matters
Several compelling motivations were outlined:
️ Environmental Impact: Transatlantic flights carry a heavy carbon footprint.
Family Considerations: Long travels can be difficult for researchers with children.
US Entry Concerns: Visa policies and border control have become increasingly strict and unpredictable.
Conference Scale: NeurIPS has grown large — some argue, too large for meaningful in-person interaction.
🧩 Community Reactions
The thread generated a flurry of responses from researchers around the world:
Concerns about fragmentation:
“Splitting the community weakens the social connections.” — Kosta Derpanis
Necessity due to constraints:
“Many US-based researchers fear not being let back in if they travel.” — David Marx
Existing precedents:
“Since the pandemic, we’ve had NeurIPS@Paris and CVPR@Paris satellite meetups.” — David Picard
Institutional support growing:
IJCAI 2025 will officially include a satellite event in Guangzhou, China to accommodate researchers unable to enter Canada.
→ IJCAI Announcement
Feasibility discussions already underway:
“There is discussion of a NeurIPS 2025 satellite outside the US but in a similar timezone.” — Thomas Dietterich
Implications for Peer Review & Conference Design
This idea challenges the default notion of “main venue” = “main stage.” As the global AI/ML community grows, inclusivity and accessibility in paper presentation become just as critical as peer-review rigor. Should presenting a NeurIPS paper require thousands of dollars and international visas?
While decentralization may create new logistical and identity challenges, it could also lead to:
More inclusive participation
Better environmental stewardship
Stronger regional AI communities
What Do You Think?
Would a European satellite venue dilute the prestige of NeurIPS, or increase its relevance and reach?
Is it time for flagship CS conferences to rethink hybrid and distributed formats, especially post-pandemic?
Drop your thoughts below.
Special thanks to @serge.belongie.com for initiating this much-needed discussion.
The Verdict: ACL 2025 Review Scores Decoded
This year’s Overall Assessment (OA) descriptions reveal a brutal hierarchy:
5.0 "Award-worthy (top 2.5%)"
️ 4.0 "ACL-worthy"
3.5 "Borderline Conference"
3.0 "Findings-tier" (Translation: "We’ll take it… but hide it in the appendix")
1.0 "Do not resubmit" (a.k.a. "Burn this and start over")
Pro tip: A 3.5+ OA avg likely means main conference; 3.0+ scraps into Findings. Meta-reviewers now hold life-or-death power—one 4.0 can save a 3.0 from oblivion.
Nightmare Fuel: The 6-Reviewer Special
"Some papers got 6 reviewers—likely because emergency reviewers were drafted last-minute. Imagine rebutting 6 conflicting opinions… while praying the meta-reviewer actually reads your response."
Rebuttal strategy:
2.0? "Give up." (Odds of salvation: ~0%)
2.5? "Worth a shot."
3.0? "Fight like hell."
The ARR Meat Grinder Just Got Worse
New changes to the ARR (Academic Rebuttal Rumble):
5 cycles/year now (April’s cycle vanished; June moved to May).
EMNLP’s deadline looms closer — less time to pivot after ACL rejections.
LLM stampede: *"8,000+ submissions per ARR cycle!
"Back in the days, ACL had 3,000 submissions. No Findings, no ARR, no LLM hype-train. Now it’s just a content farm with peer review."
How to Survive the Madness
Got a 3.0? Pray your meta-reviewer is merciful.
🤬 Toxic review? File an "issue" (but expect crickets).
ARR loophole: "Score low in Feb? Resubmit to May ARR and aim for EMNLP."
The Big Picture: NLP’s Broken Incentives
Reviewer fatigue: Emergency reviewers = rushed, clueless feedback.
LLM monoculture: 90% of papers are "We scaled it bigger" or "Here’s a new benchmark for our 0.2% SOTA."
Findings graveyard: Where "technically sound but unsexy" papers go to die.
Final thought: "If you’re not gaming the system, the system is gaming you."
Adapted from JOJO极智算法 (2025-03-28)
Share your ACL 2025 horror stories below! Did you rebut or run?
The recent ICML 2025 review cycle has sparked outrage and dark humor across the ML community. Here’s a compilation of jaw-dropping anecdotes from Zhihu (China’s Quora) exposing the chaos, ranging from clueless reviewers to systemic failures.
Buckle up!
1. The "k-NN" Incident
User "写条知乎混日子" dropped this gem:
"The reviewer asked me: ‘What does the ‘k’ in k-NN stand for?’"
Yes, a reviewer at ICML, a top-tier ML conference, needed clarification on one of the most basic ML concepts.
2. The "Pro vs. RPO" Mix-Up
User "CpGD7" shared:
"The reviewer misread ‘rpo’ as ‘pro’ and questioned why our ‘advanced version’ lost to baselines. Next time, should I rename my main experiments ‘Promax’ to get accepted?"
3. The "I Didn’t Have Time to Check Proofs" Confession
User "虚无", a reviewer, admitted:
"I got assigned 5 theoretical papers. Checking proofs properly takes 7–10 days per paper. I only had time to verify the first two; the rest got high scores based on ‘intuition’ because I couldn’t validate the math."
This raises a serious ethical concern: Papers are being accepted/rejected based on guesses, not rigor.
4. The "Citation Mafia" Reviewer
User "Jane" reported:
"A 1-star reviewer demanded we cite 7 unrelated papers — 6 of which were by the same author. We withdrew the submission."
5. The "I Review Papers in a Field I Don’t Understand" Dilemma
User "Highlight" (a biochemist) was roped into reviewing ML theory:
"I’m from a biochemistry background. They assigned me 5 ML papers. I’m scrambling to understand the math over the weekend. They must be desperate."
6. The "R is Not the Real Numbers" Debacle
User "better" vented:
"A reviewer complained: ‘What is 𝐑? You never defined it. It can’t possibly mean the real numbers!’ …What else would it be?!"
7. The "Dataset Police" Strike Again
User "Reicholguk" faced this absurdity:
"A reviewer demanded we test on ‘popular’ datasets like Cora/Citeseer, ignoring that we already used Amazon Computer and Roman Empire graphs (which are standard in our subfield). Is this reviewer an AI? Even AI wouldn’t be this clueless."
8. The "I’ll Just Give Random Scores" Strategy
Many users reported suspiciously patterned scores:
"877129391241": "One of my papers got no reviews at all (blank). Another got all 1s and 2s."
"陈靖邦": "Got 4443 after an ICLR desk reject. Is this luck or a sign reviewers just clicked randomly?"
Why This Matters
These aren’t just funny fails, they reveal deep flaws in peer review:
Overworked reviewers (5+ papers, no opt-out).
Mismatched expertise (biochemists judging theory).
Lazy/bad-faith reviews (no comments, citation demands).
Systemic randomness (scores with no justification).
As User "虚无" warned:
"If ICML keeps this up, no serious researcher will want to submit or review."
The Big Question
Should top conferences like ICML:
Cap reviewer workloads?
Allow expertise-based opt-outs?
Penalize low-effort reviews?
What’s your take? Share your worst review horror stories below!
(Sources: Zhihu users "877129391241", "虚无", "CpGD7", "陈靖邦", "Jane", "Highlight", "better", "Reicholguk", "写条知乎混日子". Original posts here.)
CCF Recommendation Conference Deadline
Transferred from ccf-deadlines
CCF-A
October 13 - 16, 2025 Lotte Hotel World, Seoul, Republic of Korea ACM Symposium on Operating Systems Principles
Deadline: Fri Apr 18th 2025 08:59:59 BST (2025-04-17 23:59:59 UTC-8)
Website of SOSP
CCF-B
October 12-18, 2025 Singapore International Static Analysis Symposium
Deadline: Mon May 5th 2025 12:59:59 BST (2025-05-04 23:59:59 UTC-12)
Website of SAS
CCF-A
November 16 - November 20, 2025 Seoul, South Korea International Conference on Automated Software Engineering
Deadline: Sat May 31st 2025 12:59:59 BST (2025-05-30 23:59:59 UTC-12)
Website of ASE
CCF-B
October 12-18, 2025 Singapore International Conference on Function Programming
Deadline: Fri Jun 13th 2025 12:59:59 BST (2025-06-12 23:59:59 UTC-12)
Website of ICFP
CCF-A
April 12-18, 2026 Rio De Janeiro, Brazil International Conference on Software Engineering
Deadline: Sat Jul 19th 2025 12:59:59 BST (2025-07-18 23:59:59 UTC-12)
Website of ICSE
AI Review for https://arxiv.org/abs/2503.10617v2
Summary
This paper introduces a new method, Compositional Subspace Representation Fine-tuning (CS-ReFT), designed to adapt large language models (LLMs) to multiple tasks while avoiding cross-task interference. The key innovation of CS-ReFT is its use of multiple orthonormal subspace transformations, each specialized for a distinct skill or task, and a lightweight router that composes these subspaces dynamically. The approach is evaluated on the AlpacaEval benchmark, where it demonstrates a high win rate compared to both larger models and other parameter-efficient fine-tuning methods.
Soundness
2
Presentation
3
Contribution
2
Strengths
Novel Method: Introduces CS-ReFT to address cross-task interference in multi-task learning using multiple orthonormal subspace transformations and a lightweight router.
Unique Approach: Distinguishes itself from traditional weight-based methods like LoRA by applying representation editing.
Clear Structure: The paper is well-organized and clearly written, with a logical flow from the introduction to method, experiments, and conclusions.
Visual Aids: Uses mathematical formulations and diagrams to effectively explain the subspace transformations and router mechanism.
Weaknesses
Scalability Concerns: Does not sufficiently discuss how CS-ReFT scales, especially in scenarios with a large number of tasks, as maintaining separate low-rank transformations may become computationally and storage-intensive.
Interpretability Issues: Lacks an analysis of the interpretability of the router's decisions. Insights into why specific weights are assigned to different subspaces could enhance trust in the model's behavior.
Questions
Scalability: How does CS-ReFT perform as the number of tasks increases? Are there strategies to optimize or reduce the computational overhead of maintaining separate subspace transformations for each task?
Router Interpretability: Can the router's decisions be interpreted or visualized to understand why specific subspaces are activated for certain inputs? Are there any patterns that emerge which could provide insights into the model's behavior?
Flag For Ethics Review
No ethics review needed.
Details Of Ethics Concerns
(None)
Rating
5
Confidence
4
The peer review process has long been upheld as the gatekeeper of quality, rigor, and integrity in academic discourse. Yet, as the recent NeurIPS 2024 keynote incident has reminded us, peer review does not extend its reach to one of the most visible stages in our field: the invited keynote.
https://youtube.com/shorts/nYY4cIg-_8I?si=bkB964WIg7QllyKR
In her keynote talk titled "How to optimize what matters most", MIT professor Rosalind Picard presented a slide that quoted a Chinese student, expelled from a top university for AI misuse, claiming, "Nobody at my school taught us morals or values". The slide included Picard’s commentary: "Most Chinese who I know are honest and morally upright".
This statement triggered widespread backlash, with AI researchers such as Jiao Sun of Google DeepMind and Yuandong Tian of Meta calling out the racial undertones and singling out of nationality as discriminatory.
Picard has since issued a public apology, acknowledging that referencing the student’s nationality was "unnecessary" and "caused unintended negative associations". NeurIPS organizers also released a statement clarifying that the comment violated their code of conduct and would be addressed internally.
But this raises a deeper issue: How did such a racially loaded narrative make it into one of the most prestigious speaking slots in AI without any checks?
In peer-reviewed papers, reviewers scrutinize phrasing, bias, and even the implications of an algorithm’s performance on diverse populations. Yet keynotes, arguably more visible and influential, operate outside the peer review system. A keynote speaker’s reputation is often deemed sufficient vetting. This incident, however, illustrates how even well-meaning researchers can unintentionally perpetuate harmful stereotypes when unchecked.
What’s more concerning is the narrative framing: invoking a student’s nationality to make a moral point, while generalizing from a single anecdote to an entire culture. It underscores a blind spot we often fail to address in AI research: that ethical responsibility is not just about the fairness of our models, but also the fairness of our words and stories.
If NeurIPS had subjected keynote slides or abstracts to a light peer review (or even a DEI advisory group), this incident might have been avoided. The community deserves keynote talks that not only inspire technically, but also model the inclusive values we preach in our papers.
This is a wake-up call. If we expect AI models to mitigate bias, we must also ask whether our academic institutions and conferences are holding themselves to the same standard. Bias is not just a data problem; it’s a human problem. And no, peer review won’t solve everything, but perhaps it should go a bit further than we thought.
Your thoughts? How can major conferences better safeguard against such missteps? Should keynotes be more rigorously reviewed, both ethically and technically?
Recently, someone surfaced (again) a method to query the decision status of a paper submission before the official release for ICME 2025. By sending requests to a specific API (https://cmt3.research.microsoft.com/api/odata/ICME2025/Submissions(Your_paper_id)) endpoint in the CMT system, one can see the submission status via a StatusId field, where 1 means pending, 2 indicates acceptance, and 3 indicates rejection.
This trick is not limited to ICME 2025. It appears that the same method can be applied to several other conferences, including: IJCAI, ICME, ICASSP, IJCNN and ICMR.
However, it is important to emphasize that using this technique violates the fairness and integrity of the peer-review process. Exploiting such a loophole undermines the confidentiality and impartiality that are essential to academic evaluations. This is a potential breach of academic ethics, and an official fix is needed to prevent abuse.
Below is a simplified Python script that demonstrates how this status monitoring might work. Warning: This code is provided solely for educational purposes to illustrate the vulnerability. It should not be used to bypass proper review procedures.
import requests
import time
import smtplib
from email.mime.text import MIMEText
from email.header import Header
import logging
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler("submission_monitor.log"),
logging.StreamHandler()
]
)
# List of submission URLs to monitor (replace 'Your_paper_id' accordingly)
SUBMISSION_URLS = [
"https://cmt3.research.microsoft.com/api/odata/ICME2025/Submissions(Your_paper_id)",
"https://cmt3.research.microsoft.com/api/odata/ICME2025/Submissions(Your_paper_id)"
]
# Email configuration (replace with your actual details)
EMAIL_CONFIG = {
"smtp_server": "smtp.qq.com",
"smtp_port": 587,
"sender": "your_email@example.com",
"password": "your_email_password",
"receiver": "recipient@example.com"
}
def get_status(url):
"""
Check the submission status from the provided URL.
Returns the status ID and a success flag.
"""
try:
headers = {
'User-Agent': 'Mozilla/5.0',
'Accept': 'application/json',
'Referer': 'https://cmt3.research.microsoft.com/ICME2025/',
# Insert your cookie here after logging in to CMT
'Cookie': 'your_full_cookie'
}
response = requests.get(url, headers=headers, timeout=30)
if response.status_code == 200:
data = response.json()
status_id = data.get("StatusId")
logging.info(f"URL: {url}, StatusId: {status_id}")
return status_id, True
else:
logging.error(f"Failed request. Status code: {response.status_code} for URL: {url}")
return None, False
except Exception as e:
logging.error(f"Error while checking status for URL: {url} - {e}")
return None, False
def send_notification(subject, message):
"""
Send an email notification with the provided subject and message.
"""
try:
msg = MIMEText(message, 'plain', 'utf-8')
msg['Subject'] = Header(subject, 'utf-8')
msg['From'] = EMAIL_CONFIG["sender"]
msg['To'] = EMAIL_CONFIG["receiver"]
server = smtplib.SMTP(EMAIL_CONFIG["smtp_server"], EMAIL_CONFIG["smtp_port"])
server.starttls()
server.login(EMAIL_CONFIG["sender"], EMAIL_CONFIG["password"])
server.sendmail(EMAIL_CONFIG["sender"], [EMAIL_CONFIG["receiver"]], msg.as_string())
server.quit()
logging.info(f"Email sent successfully: {subject}")
return True
except Exception as e:
logging.error(f"Failed to send email: {e}")
return False
def monitor_submissions():
"""
Monitor the status of submissions continuously.
"""
notified = set()
logging.info("Starting submission monitoring...")
while True:
for url in SUBMISSION_URLS:
if url in notified:
continue
status, success = get_status(url)
if success and status is not None and status != 1:
email_subject = f"Submission Update: {url}"
email_message = f"New StatusId: {status}"
if send_notification(email_subject, email_message):
notified.add(url)
logging.info(f"Notification sent for URL: {url} with StatusId: {status}")
if all(url in notified for url in SUBMISSION_URLS):
logging.info("All submission statuses updated. Ending monitoring.")
break
time.sleep(60) # Wait for 60 seconds before checking again
if __name__ == "__main__":
monitor_submissions()
Parting thoughts
While the discovery of this loophole may seem like an ingenious workaround, it is fundamentally unethical and a clear violation of the fairness expected in academic peer review. Exploiting such vulnerabilities not only compromises the integrity of the review process but also undermines the trust in scholarly communications.
We recommend the CMT system administrators to implement an official fix to close this gap. The academic community should prioritize fairness and the preservation of rigorous, unbiased review standards over any short-term gains that might come from exploiting such flaws.
The ACL Rolling Review (ARR) process is undergoing significant changes in 2025 to address concerns around review workload, decision reliability, and consistency with conferences. Here’s a summary of the key updates:
New Review Cycle: 5 Rounds Per Year
Instead of monthly submissions, ARR will now have 5 review rounds per year.
The planned review cycles for 2025 are February, May, July, October, and January 2026.
This change aims to balance reviewer workload and submission volume while maintaining quality.
Meta Reviews to Decide Findings
ARR will now make Findings decisions directly instead of passing the responsibility to conferences.
This is expected to improve alignment between ARR and conference decision-making.
However, concerns remain about reliability and the potential impact on acceptance rates.
Open Questions and Community Concerns
1️⃣ Reliability of ARR Meta Reviewers: Experiments at ACL 2025/EMNLP 2025 will assess decision consistency.
2️⃣ Acceptance Rate Changes: Current conferences accept ~35-40% of submissions. Higher acceptance via ARR might raise quality concerns.
3️⃣ Unified Findings Venue:
Previously, Findings papers were associated with specific conferences, such as Findings of ACL, Findings of EMNLP, and Findings of NAACL.
The proposed change would consolidate all Findings papers under a single umbrella venue (Findings of ACL).
Potential benefits: Simplifies submission logistics and reduces ambiguity about which conference a Findings paper belongs to.
Potential drawbacks: Findings papers would no longer be tied to prestigious main conferences (e.g., ACL, EMNLP), which could lower their perceived status and impact in the community. For example, in some academic institutions and funding agencies, having “Findings of ACL” may not be viewed as highly as “Findings of EMNLP” or “Findings of NAACL” due to differences in how venues are ranked.
What’s Next?
The 5-cycle review system is confirmed for 2025.
The Meta Review-driven Findings decision is still under discussion, with potential refinements based on feedback.
Researchers should stay updated on ACL policy announcements to navigate submission strategies effectively.
Read more here: https://www.aclweb.org/portal/sites/default/files/COPR-5-cycle-report-to-publish.pdf
What do you think of these changes? Will they improve the review process, or create new challenges? Let’s discuss!

Reproducibility is key in science, but in computer science and machine learning (ML), it's often overlooked. Even though most ML experiments should be able to reproduce comparatively easy (except for some papers that require very heavy computation, many top papers can't be reproduced. Often, the code is not even has very basic and obvious errors in it, being far away from even being re-used.
Here a more elaborate blog post on this: https://www.mariushobbhahn.com/2020-03-22-case_for_rep_ML/
This difference raises questions:
Should CS and ML have stricter rules for reproducibility in peer reviews?
Should more CS and ML researchers retract papers that can't be reproduced?
How can we encourage researchers to make their work reproducible?
One effort to fix this is "Papers Without Code," where people can report ML papers that can't be reproduced. Its creator says, "Unreproducible work wastes time, and authors should make sure their work can be replicated."
Improving reproducibility could greatly help peer reviews. If reviewers could easily test the results in a paper, they could:
Check if the results are correct
Understand the work better
Give more helpful feedback
Spot potential problems early
This would lead to better quality research being published. It would also save time and resources in the long run, as other researchers wouldn't waste effort trying to build on work that doesn't actually work.
What do you think? How can we make CS and ML research more reproducible? How would this change peer reviews?
Have you ever wonder, as a reviewer, under what circumstances would you choose to reject a paper?
To provide clarity, let's look at insights shared by an experienced reviewer, recognized with a Top Reviewer Award at NeurIPS. They were initially posted in 2022 and I find still very relevant to nowadays peer reviews.
Types of papers and how they influence rejection decisions:
Paradigm-shifting contributions
These papers fundamentally change your understanding or introduce revolutionary concepts, prompting an enthusiastic reaction. Such papers easily deserve a rating of "strong accept" or higher, and reviewers usually actively defend these submissions during discussions.
For example, Stanford’s paper on differential privacy drastically reducing computation costs from tens of times down to twice the normal workload reshaped perceptions of privacy-computation complexity. Similarly, transformative works like the invention of Transformer models or adaptive optimization techniques (e.g., AdaGrad) fall into this category.
Incremental, but valuable confirmations
Approximately 70% of academic work fits here. These papers might not be groundbreaking but provide valuable confirmations and thorough examinations of existing ideas. They help reduce uncertainty about a research path by validating and refining current methodologies.
For instance, after Neural Tangent Kernel (NTK) theory initially explored two-layer neural networks, subsequent work by researchers like Allen Zhu expanded these results, covering cross-entropy loss and SGD settings. Although incremental, such solid contributions are typically accepted.
Minimal impact, repetitive, or superficial innovations
These papers often repackage existing knowledge or slightly adjust known models, datasets, or tasks without genuine novelty. They usually receive "borderline reject" scores, particularly in prestigious conferences like NeurIPS, unless impeccably executed and thorough.
Fundamental flaws and serious errors
Increasingly common are submissions containing fundamental methodological errors, misrepresented data, inadequate baselines, or unclear and contradictory explanations. Examples include using test sets as training sets or drastically misrepresenting performance metrics. Such papers typically receive immediate "borderline reject" ratings, with potential adjustments only if authors clarify concerns convincingly.
Beyond paper quality itself, ethical violations such as plagiarism or simultaneous submissions to multiple journals or conferences are grounds for automatic rejection, regardless of manuscript quality.
Summary Advice for New Reviewers:
Clearly differentiate between groundbreaking and incremental contributions.
Ensure thoroughness and accuracy, but stay open-minded about incremental advances.
Remain vigilant for methodological rigor and clarity.
Prioritize ethical compliance and rigorously check submissions against potential misconduct.
In the end, clear classification and objective judgment ensure both fair assessments and constructive feedback, benefiting the broader scientific community.
Recently, I was revising my paper and, out of curiosity, asked DeepSeek AI to give some sharp reviews of major AI conferences. The results were entertaining enough to share!
[image: skynews-deepseek-app_6812411.jpg]
CVPR
The "crown jewel" of computer vision, with submission volume rivaling holiday train tickets and painfully low acceptance rates.
CVPR is an event that raises the blood pressure of AI researchers globally. If your paper gets rejected, don't worry—90% of people share your fate.
CVPR is like a battlefield for papers, with reviewers acting as ruthless supervisors holding magnifying glasses to pick faults. You're the student endlessly tormented.
Hot topics? They change annually. This year it’s diffusion models; next year, it'll probably be something even more mystical. CVPR teaches us: forget eternity; pursue temporary insanity instead.
NeurIPS
The "Oscars" of machine learning, with acceptance rates heartbreakingly low.
NeurIPS resembles an academic stand-up comedy show, where everyone gathers to share their "innovative ideas." But don’t celebrate prematurely; your paper might still be rejected for "inelegant experimental design."
Reviewers here are like scholarly monks living deep in the mountains, meditating on how to perfect each paper. Their feedback sounds like verdicts from the void: "Sorry, your model isn't mystical enough."
NeurIPS teaches one valuable lesson — before conquering the world, first conquer your reviewers.
ACL
The "ancestor" of NLP, acceptance rates low enough to shatter hearts.
ACL is like a dating event for language models, where researchers showcase their masterpieces only to realize "other people's children" always seem better.
Reviewers here act like literary inquisitors, scrutinizing your paper under a microscope for every potential logical flaw: "Why didn’t you use a larger dataset?" "Why didn't you try that model instead?"
ACL reveals an essential truth: forget fairness; believe only in your capabilities.
ICML
The "Spring Festival Gala" of machine learning, enormous and influential.
ICML is like an academic amusement park, where everyone chases the latest research trends. But remember, the entry ticket is your paper.
Reviewers here act like academic police, checking if your paper meets their aesthetic standards: "Your experimental design isn't innovative enough," or "Your results aren't impressive enough."
ICML reminds us there are no shortcuts, only relentless effort.
ICLR
The "rising star" of deep learning, with acceptance rates crushingly low.
ICLR is an academic influencer’s hotspot, with everyone eager to showcase their research. However, the entry threshold is deceptively high.
Reviewers here behave like fashion bloggers, critically evaluating if your paper aligns with their aesthetic tastes: "Your model isn’t flashy enough," "Your results aren't stunning enough."
ICLR imparts the truth—work hard, but never forget to be trendy.
The "mysticism" and "reality" of those conferences
Submissions are lotteries: Acceptance often relies heavily on your luck.
🧩 Reviewers are puzzles: Their feedback can feel enigmatic, originating from mysterious realms.
Academic rat race: Top conferences resemble massive "paper factories," with everyone desperately chasing trends and novelty, sometimes realizing too late that their efforts were in vain.
Final thoughts
If your goal is to publish in top conferences, brace yourself for a prolonged psychological battle.
Yet remember, academic research aims to explore truths, not chase prestigious conference halos.
View submissions as part of your scholarly journey and feedback from reviewers as opportunities for growth. True academic value isn't measured by how many top papers you publish, but by your contributions to advancing the field.
Since 2021, ACL has moved away from Softconf and introduced the new Rolling Review system (ARR). Official website: https://aclrollingreview.org
Rolling Review divides paper submissions into two phases:
Submit your paper to the ARR system (“make submission”). Reviews will be provided after one month.
Commit your submission from step 1 to your target conference.
This means you initially submit your paper to the ARR system rather than directly to the conference, allowing flexibility in choosing conferences afterward.
For a crash course on ARR review paradigm, you can watch this video by Shubhra Kanti Karmaker Santu:
https://youtu.be/DaoCLEghXyU?si=XuNrwgsg4APzDQeA