Reviewing for NeurIPS 2025 Datasets & Benchmarks: Insights from the Trenches
-
As a reviewer for the NeurIPS 2025 Datasets & Benchmarks (D&B) track, I’ve been working through my assigned submissions — and, as many of you may relate to, my inbox has been buzzing with notifications from the program chairs. While reviewing remains a thoughtful, human-driven task, this year’s workflow includes a few important upgrades that are worth sharing, especially for researchers who care deeply about the transparency, reproducibility, and ethics of peer review.
Here’s a quick behind-the-scenes look at how the process works in 2025 and how it differs from previous years.
Automatic Dataset Reports: A New Gatekeeping Assistant
One of the most noticeable improvements this year is the automatic generation of a Dataset Reviewer Report for each submission that includes a dataset. This report is not a replacement for human judgment, but rather a helpful tool to assist reviewers in evaluating dataset accessibility, structure, and metadata completeness.
This report is based on a metadata format called Croissant, and it checks:
- Whether the dataset URLs actually work
- If the dataset files can be downloaded and accessed
- If a valid license and documentation are included
- Whether basic ethical and Responsible AI (RAI) information is present
️
Think of this as a checklist that helps filter out incomplete or misleading submissions early on — without you needing to spend your first 30 minutes chasing broken links.
You also get Python code snippets auto-generated in the report to help you load and explore the dataset directly from platforms like Kaggle, Hugging Face, or Dataverse. It’s a small touch, but really reduces friction during the review.
️ Responsible Reviewing Is Now Mandatory (Not Just Encouraged)
The Responsible Reviewing Initiative is not new, but it’s more strictly enforced this year. Reviewers are now expected to look for the following in each dataset paper:
- Is the dataset publicly available and reproducible?
- Are ethical considerations and data limitations addressed?
- Are RAI fields (like bias, demographic info, or collection methods) present or at least acknowledged?
- Is the licensing and permission status clear?
These were optional or lightly emphasized in previous years, but they now carry real weight in your evaluation — especially for a track that centers on datasets and benchmarks.
If a dataset claims to be open but is inaccessible, lacks a license, or ignores potential bias or harm, reviewers are encouraged to flag this as a major concern.
️ Review Process Reminders
Here are a few reminders for reviewers in 2025:
- Don't use LLMs to process or summarize submissions — per NeurIPS’s LLM usage policy, reviewing is strictly human-only.
- Be proactive in checking for conflicts of interest. Not all COIs are perfectly detected by the system.
- Every submission matters — even if the topic is outside your direct interests, you’re expected to review it unless there’s a serious reason you cannot (in which case, contact your Area Chair).
- Watch your assignment list — more papers may get added during the review period.
What’s Better Compared to Last Year?
Feature 2024 2025 Dataset Accessibility Check Manual by reviewer Auto-checked by metadata report
Responsible AI Metadata Encouraged ️ Now explicitly reviewed
Review Support Tools Basic 🧰 Code snippets and report summaries Licensing and Ethics Optional in many cases More formally required
LLM Policy Vague enforcement Strict ban on use in reviews
🧠 Takeaways for Researchers and Reviewers
The D&B track is evolving to match the increasing complexity of data-driven research. If you’re a researcher, this means submitting your dataset now requires more than just a ZIP file on Google Drive — it needs structure, documentation, and ethical awareness.
If you’re a reviewer, you now have better tools to assess those aspects — but also more responsibility to do so thoughtfully.
All of this helps build a stronger, more reproducible research ecosystem, and makes dataset contributions as robust as model papers.
- Whether the dataset URLs actually work