Reproducibility is key in science, but in computer science and machine learning (ML), it's often overlooked. Even though most ML experiments should be able to reproduce comparatively easy (except for some papers that require very heavy computation, many top papers can't be reproduced. Often, the code is not even has very basic and obvious errors in it, being far away from even being re-used.
Here a more elaborate blog post on this: https://www.mariushobbhahn.com/2020-03-22-case_for_rep_ML/
This difference raises questions:
Should CS and ML have stricter rules for reproducibility in peer reviews?
Should more CS and ML researchers retract papers that can't be reproduced?
How can we encourage researchers to make their work reproducible?
One effort to fix this is "Papers Without Code," where people can report ML papers that can't be reproduced. Its creator says, "Unreproducible work wastes time, and authors should make sure their work can be replicated."
Improving reproducibility could greatly help peer reviews. If reviewers could easily test the results in a paper, they could:
-
Check if the results are correct
-
Understand the work better
-
Give more helpful feedback
-
Spot potential problems early
This would lead to better quality research being published. It would also save time and resources in the long run, as other researchers wouldn't waste effort trying to build on work that doesn't actually work.
What do you think? How can we make CS and ML research more reproducible? How would this change peer reviews?