Reviewing for NeurIPS 2025 Datasets & Benchmarks: Insights from the Trenches

river

As a reviewer for the NeurIPS 2025 Datasets & Benchmarks (D&B) track, I’ve been working through my assigned submissions — and, as many of you may relate to, my inbox has been buzzing with notifications from the program chairs. While reviewing remains a thoughtful, human-driven task, this year’s workflow includes a few important upgrades that are worth sharing, especially for researchers who care deeply about the transparency, reproducibility, and ethics of peer review.

Here’s a quick behind-the-scenes look at how the process works in 2025 and how it differs from previous years.

Automatic Dataset Reports: A New Gatekeeping Assistant

One of the most noticeable improvements this year is the automatic generation of a Dataset Reviewer Report for each submission that includes a dataset. This report is not a replacement for human judgment, but rather a helpful tool to assist reviewers in evaluating dataset accessibility, structure, and metadata completeness.

This report is based on a metadata format called Croissant, and it checks:

Whether the dataset URLs actually work
If the dataset files can be downloaded and accessed
If a valid license and documentation are included
Whether basic ethical and Responsible AI (RAI) information is present ️

Think of this as a checklist that helps filter out incomplete or misleading submissions early on — without you needing to spend your first 30 minutes chasing broken links.

You also get Python code snippets auto-generated in the report to help you load and explore the dataset directly from platforms like Kaggle, Hugging Face, or Dataverse. It’s a small touch, but really reduces friction during the review.

️ Responsible Reviewing Is Now Mandatory (Not Just Encouraged)

The Responsible Reviewing Initiative is not new, but it’s more strictly enforced this year. Reviewers are now expected to look for the following in each dataset paper:

Is the dataset publicly available and reproducible?
Are ethical considerations and data limitations addressed?
Are RAI fields (like bias, demographic info, or collection methods) present or at least acknowledged?
Is the licensing and permission status clear?

These were optional or lightly emphasized in previous years, but they now carry real weight in your evaluation — especially for a track that centers on datasets and benchmarks.

If a dataset claims to be open but is inaccessible, lacks a license, or ignores potential bias or harm, reviewers are encouraged to flag this as a major concern.

️ Review Process Reminders

Here are a few reminders for reviewers in 2025:

Don't use LLMs to process or summarize submissions — per NeurIPS’s LLM usage policy, reviewing is strictly human-only.
Be proactive in checking for conflicts of interest. Not all COIs are perfectly detected by the system.
Every submission matters — even if the topic is outside your direct interests, you’re expected to review it unless there’s a serious reason you cannot (in which case, contact your Area Chair).
Watch your assignment list — more papers may get added during the review period.

What’s Better Compared to Last Year?

Feature	2024	2025
Dataset Accessibility Check	Manual by reviewer	Auto-checked by metadata report
Responsible AI Metadata	Encouraged	️ Now explicitly reviewed
Review Support Tools	Basic	🧰 Code snippets and report summaries
Licensing and Ethics	Optional in many cases	More formally required
LLM Policy	Vague enforcement	Strict ban on use in reviews

🧠 Takeaways for Researchers and Reviewers

The D&B track is evolving to match the increasing complexity of data-driven research. If you’re a researcher, this means submitting your dataset now requires more than just a ZIP file on Google Drive — it needs structure, documentation, and ethical awareness.

If you’re a reviewer, you now have better tools to assess those aspects — but also more responsibility to do so thoughtfully.

All of this helps build a stronger, more reproducible research ecosystem, and makes dataset contributions as robust as model papers.

CSPaper: peer review sidekick