Skip to content
👋 Welcome! Feel free to register (verified or anonymous) and share your thoughts or story — your voice matters here! 🗣️💬
Review Service Icon 🚀 Now Live: Our AI-powered paper review tool is available in beta! Perfect for CS conference submissions — get fast, targeted feedback to improve your chances of acceptance.
👉 Try it now at review.cspaper.org
  • Official announcement from CSPaper.org

    3 4
    3 Topics
    4 Posts
    rootR
    It took just seven days for CSPaper Review to surpass 2,000 registered users, and we´re only getting started! Nearly 8,000 reviews completed Over 80,000 pages reviewed [image: 1752141521438-screenshot-2025-07-10-at-11.58.21.png] To every researcher, developer, and partner who has tried, shared, and supported the platform, thank you. Your enthusiasm is turning our vision of faster, fairer peer review into reality. 3-Minute User Survey Help Shape CSPaper – Vote for Our Next Feature! You set our roadmap! Spend three minutes on our survey to tell us your most urgent needs and pain points, and you’ll have the option of becomming a beta tester! [Limited seats] Survey link -> Survey Become a Beta Tester [Limited seats] — Help Shape What’s Next Join our exclusive beta tester group and get a front-row seat to the future of our platform. Here’s what you’ll get: 🧪 Early Access – Try out new features before anyone else Direct Influence – Share your feedback and help shape product decisions Insider Role – Be part of a small, trusted group helping us improve and innovate Perks & Rewards – Enjoy occasional surprises, sneak peeks, and recognition for your input Ready to help us build something better? Sign up through the survey and unlock early access! Beta Tester sign-up → Beta tester Thank you again to the 2,000 + members who put their trust in us. Together, let’s usher in a new era of intelligent peer review Try it now: https://review.cspaper.org Feedback: support@cspaper.org Community forum: https://cspaper.org — The CSPaper Review Team Built by the research community, for the research community—helping you boost productivity with ease. Feel free to share this post with colleagues and friends and explore the power of AI-driven research together!
  • AI-powered paper reviews for top CS conferences — fast, targeted insights to help boost your acceptance odds. Discuss anything related to the CSPaper Review Tool at review.cspaper.org: ask questions, report issues, or suggest improvements.

    5 5
    5 Topics
    5 Posts
    rootR
    Hi CSPaper community , We’re excited to share a continued update to https://review.cspaper.org with new features and improvements based on your amazing feedback! New Conference Support: IJCAI main track is here! Now you can review your paper with IJCAI main track agent workflow directly on CSPaper. [image: 1752442641772-screenshot-2025-07-13-at-23.03.31.jpg] If you are in great need of a conference review capability, you can choose to give us feedback at https://docs.google.com/forms/d/12XNzsjVqN3cvIxUhwio7AMwVz9ZtWJcY0RM9UkilVDM, email us at support@cspaper.org, or simply reply right here in the forum. We will prioritize the conference that has the most of the votes. Overall Score Calibration Step We introduced an additional step to verify and calibrate the overall review score against the individual sub-scores. This adjustment was made in response to reports of inconsistencies — specifically, cases where the overall score appeared misaligned with the ratings in other evaluation dimensions. [image: 1752441486757-screenshot-2025-07-13-at-23.08.13.jpg] My Review UI Enhancement We add conference and track information to your review list with new UI look&feel. [image: 1752441785050-screenshot-2025-07-13-at-23.22.52.png] Compliance: cookie banner [image: 1752441929278-screenshot-2025-07-13-at-23.24.20.png] Bug Fixes We fixed a few latex processing issues when reviewing Arxiv papers. Please report to us if your feel something does not add up. For example, your paper has defenitely more pages, while our agent say "Your paper is too short (< 2 pages)". Please report to us by either emailing us at support@cspaper.org, or simply replying right here in the forum. What’s Next? You Decide! We're already planning our next set of features — and we’re listening to you. Please select the most important feature for you: https://docs.google.com/forms/d/e/1FAIpQLScEjKk0t8JekNQiUXfT2n4aIwoW80BpHxG_BrH_XEyJff1mTQ/viewform Cheers, The CSPaper Team ️ support@cspaper.org
  • 80 Topics
    255 Posts
    JoanneJ
    [image: 1752595629556-54eed4de-97d5-4dbd-9c83-e500ce1d8ccc-image.png] Since its debut in 2015, Batch Normalization (BN) has seen its original motivation repeatedly “debunked” by follow-up work, yet in 2025 it still captured the ICML Test-of-Time prize to Sergey Ioffe and Christian Szegedy. [image: 1752594641740-d2c74fed-83ea-4a92-ab8a-cb91b5b2d895-image.png] What does that really say? This article traces BN’s canonisation by following two threads: the award’s evaluation logic and the layer’s systemic impact on deep learning. 1. The Test-of-Time award is not about being theoretically perfect The ICML guidelines are explicit: the Test-of-Time award honours papers published ten years ago that shaped the field in the decade since. It does not reaudit theoretical soundness. Impact metrics: the BN paper has been cited more than 60 000 times, making it one of the most-cited deep-learning papers of its era. Down stream work: from regularisation and optimisation to architecture design, hundreds of papers start from BN to propose improvements or explanations. Practical penetration: BN is baked into almost every mainstream DL framework’s default templates, becoming a “no-brainer” layer for developers. Conclusion: What the committee weighs is: “If you removed this paper ten years later, would the community be missing a cornerstone?” Theoretical controversy does not diminish its proven engineering value. 2. So what is the theory behind BatchNorm? The original motivation was to reduce Internal Covariate Shift (ICS): as parameters change, the input distribution of downstream layers drifts, forcing them to continually adapt, slowing and destabilising training. BN standardises activations within each mini-batch, explicitly anchoring the distribution and decoupling layers. Two-step recipe [image: 1752595075011-6cb3f6e4-d9ec-4ac0-bdd5-68a67224aaa4-image.png] Key derivations Normalisation → stable gradients: zero-mean/unit-variance keeps activations in “flat” regions of nonlinearities, mitigating exploding/vanishing gradients. Affine → full expressiveness: adding ( \gamma, \beta ) re-parameterises rather than constrains the network. Train vs. inference: batch statistics at train time; running averages for deterministic inference. Theoretical evolution Later studies (e.g. Santurkar 2018; Balestriero 2022) argue that ICS is not the sole driver. They instead find that BN smooths the loss landscape and improves gradient predictability, or acts as an adaptive, unsupervised initialisation, still analysing why the two-step recipe works. 3. Each challenge to the theory has reinforced its hard value Year Key objection Outcome & new view 2018 (MIT) Injecting noise after BN shows “ICS is not essential”; the real benefit is a smoother optimisation landscape & more predictable gradients. Training still accelerates ︎ → BN framed as an “optimiser accelerator”. 2022 (Rice & Meta) Geometric view: BN resembles an unsupervised adaptive initialiser and, via mini-batch noise, enlarges decision-boundary margins. Explains BN’s persistent generalisation boost. 4. Five cascading effects on the deep-learning stack 1). Unlocked ultra deep training ResNet and its descendants scaled from tens to hundreds or even thousands of layers largely because a BN layer can be slotted into every residual block. 2). Halved (or better) training time & compute In its release year BN slashed ImageNet SOTA training steps to 1⁄14, directly pushing large-scale adoption in industry. 3). Normalised high learning rates / weak initialisation Tedious hand-tuning became optional, freeing AutoML and massive hyper-parameter sweeps. 4). Spawned the “Norm family” LayerNorm, GroupNorm, RMSNorm… each targets a niche but all descend from BN’s interface and analysis template. 5). Reshaped optimisation theory BN-inspired ideas like “landscape smoothing” and “re-parameterisation” rank among the most-cited optimisation topics in recent years. 5. Why systemic impact outweighs a perfect theory Industrial priority: The top question is whether a technique lifts stability / speed / cost, BN does. Scholarly spill-over: Even evolving explanations are fertile academic fuel once the phenomena are reproducible. Ecosystem lock in: Once written into framework templates, textbooks and inference ASIC kernels, replacement costs skyrocket, creating a de-facto standard. One-sentence summary: Like TCP/IP, even if first generation assumptions later prove flawed, BN remains the “base protocol” of the deep-learning era. 6. Looking ahead Open question: Micro batch training and self-attention violate BN’s statistical assumptions, will that spark a next gen normalisation? Methodology: BN’s success hints that intuition + engineering validation can drag an entire field forward faster than a closed-form theory. Award lesson: The Test-of-Time prize reminds us that long-term influence ≠ flawless theory; it’s about leaving behind reusable, recombinable “public Lego bricks”. Recommended reading Sergey Ioffe & Christian Szegedy, Batch Normalization (2015) Shibani Santurkar et al., How Does Batch Normalization Help Optimisation? (2018) Randall Balestriero & Richard Baraniuk, Batch Normalization Explained (2022)
  • Anonymously share data, results, or materials. Useful for rebuttals, blind submissions and more. Only unverified users can post (and edit or delete anytime afterwards).

    4 4
    4 Topics
    4 Posts
    H
    Impl. based on nr0034je9.zip . Table A: Model Performance on NLP Benchmarks Model SST-2 (Acc) MNLI (Acc) QNLI (Acc) CoLA (Matthews) Avg Score BERT-Base 91.2 84.6 90.1 58.2 81.0 RoBERTa-Base 92.3 87.4 91.8 63.1 83.7 GPT-3 (175B) 94.1 88.9 93.0 66.4 85.6 Our Method 94.8 89.7 93.5 68.9 86.7 Table B: Ablation Study on Model Components (Evaluated on MNLI) Configuration Attention Mechanism Pretraining Corpus MNLI (Acc) Full Model Multi-head Self-Attn Custom + Public 89.7 – w/o Custom Corpus Multi-head Self-Attn Public Only 87.1 – w/o Attention Refinement Block Basic Self-Attn Custom + Public 86.5 – w/o Positional Embeddings Multi-head Self-Attn Custom + Public 85.2 – Random Initialization — — 72.4
Popular Tags