When Peer Review Goes Nuclear: The Saga of *UltraLightUNet* and the EMCAD Déjà Vu

root

The annual ICLR peer-review season is always a drama buffet, but the story behind “UltraLightUNet: Rethinking U-shaped Network with Multi-kernel Lightweight Convolutions for Medical Image Segmentation” (see OpenReview submission page) is the kind of spicy saga that makes you want to grab popcorn and clutch your arXiv preprints tight.

Screenshot 2025-08-01 at 00.48.40.jpg

Setting the Stage: UltraLightUNet Enters the Arena

UltraLightUNet promised a new era for medical image segmentation: tiny, nimble, yet supposedly able to outperform those chunky transformer and UNet-based behemoths hogging GPUs in radiology labs. The core claims?

Multi-Kernel Inverted Residual (MKIR) Block: Squeeze more features from less compute by mixing kernel sizes.
Multi-Kernel Inverted Residual Attention (MKIRA) Block: Further boost performance by gluing together multi-focal attention and the above.
2D and 3D: Not just flat slides — volumetric stuff too!
Resource Efficiency: “Look, Ma, no FLOPs!” (Okay, just less.)

But here’s where the fun begins.

Peer Review: When All Paths Lead to EMCAD

At ICLR, UltraLightUNet was greeted by two of the most feared words in the reviewer lexicon: Strong Reject. Here’s what happened:

Reviewer #1 (jeKK):

"This is basically EMCAD (CVPR 2024), just with a few tweaks and a new name. And the same authors too! Is this plagiarism or dual submission?"

Reviewer jeKK didn’t just drop accusations—he provided side-by-side architectural diagrams, tables, and direct quotes. The authors countered with a robust defense: “It’s not the same, we swear! Ours is the electric scooter to EMCAD’s highway system!” (Yes, that analogy was real.)

Reviewer #2 (zJLj):

"Novelty? Meh. Depth-wise convolution and attention have been milked dry. All you did was N to N+1 module fusion. Not impactful, and your baseline results are fishy—why copy them from other non-benchmark papers instead of rerunning?"

Both reviewers, in classic OpenReview fashion, asked for more comparisons, more 3D SOTA, more theoretical depth, more interpretability, more everything.

Rebuttal Escalation: Authors Go All-In

The authors responded thoroughly. They added new experiments, ablation tables, “activation heatmaps,” and wall-of-text explanations (math, analogies, even referencing the kitchen sink). They pointed to marginal improvements and hyper-efficiency. They even compared their work’s relevance to historic ICLR empirical classics like MobileViT and 3D UX-Net.

But reviewers weren’t buying it. The final scores? “1 – strong reject” stayed glued to the top of the review. (Read the full OpenReview thread for a peer-review novella.)

The “Stitching Modules” Controversy — A Deeper Look

While the phrase “module stitching” might sound like the punchline to a conference meme, it has become a lightning rod in the peer-review discourse — especially in fields like computer vision where architectural innovation often means remixing familiar blocks. The UltraLightUNet rejection was quickly discussed on Chinese academic social media, with the shorthand “缝合怪” (stitching monster) capturing both the amusement and the anxiety of the community. But this isn’t just idle gossip — there are real research and community issues at stake.

Why Do Reviewers Care So Much About "Stitching"?

The Core Critique:
Both ICLR reviewers hammered the point that UltraLightUNet was more of a “module soup” than a genuinely new architecture. They saw it as assembling well-known components (depth-wise convolutions, U-shaped architectures, attention gates) into yet another UNet variant, without introducing fundamentally new insights or mechanisms. One even detailed how each "new" module mapped directly to a block in the prior EMCAD paper — same code, new wrapper.
The Author’s Defense:
The authors, for their part, argued forcefully that how you stitch matters:

“Our multi-kernel trick allows for same-size or different-size kernels, while prior work only did multi-scale. That’s a new degree of freedom, not just cosmetic surgery!”
They also emphasized that their focus on a lightweight encoder and 3D generalization marked a conceptual shift. But the reviewers were unmoved, calling the distinction marginal — “N to N+1,” not a leap.

The Stitching Paradox: Incremental vs. Fundamental

Here’s the tension:

Stitching is not always bad. After all, most advances in deep learning (ResNet, MobileNet, EfficientNet) began as clever combinations and refinements of earlier ideas.
The problem arises when stitching is incremental - simply swapping parts without a real conceptual leap, or when the stitched modules solve the same problem the same way as previous work. Then, the community sees it as “just another Frankenstein,” not a breakthrough.

Why “Stitching” Gets Strong Rejects (at ICLR and Beyond)

“Too Easy to Replicate”: If a paper can be built by simply grabbing code from three open-sourced models and gluing it together, it’s unlikely to inspire or move the field forward.
“No New Theory, No New Insight”: Top-tier venues want new directions, not just new recipes. Reviewers want to see either (1) a theoretical advance, (2) an empirical surprise, or (3) a general principle, not just a more efficient wiring diagram.
Benchmarking Fatigue: As reviewer zJLj pointed out, the incremental gain in DICE scores did not justify a new entry in the ever-expanding UNet family—especially when baselines weren’t optimally tuned.

What Counts as Good Stitching?

Engineering Value vs. Scientific Value: If your “stitching” produces real-world benefits (dramatic speed, memory, cost reductions), be up front about the engineering focus and show it matters in deployment. But don’t oversell as “novel architecture” in the scientific sense.
Combining Modules in Unusual Ways: Sometimes, new connections really do lead to new phenomena or insights—think “attention is all you need.” But that bar is high, and reviewers have seen every possible permutation of UNet modules by now.
Explain the Rationale: If you do stitch, explain why the combination is more than the sum of its parts. Ideally, provide ablation, theory, or empirical evidence showing emergent behavior.

CS Paper Review Result

This paper (recently) got run through CSPaper Review (https://review.cspaper.org). The review engine flagged a high similarity to the “EMCAD” method — a 2024 CVPR paper. CSPaper wasn’t wrong. The same suspicion would soon doom the paper at ICLR.

Screenshot 2025-07-31 at 22.23.09.png
Figure 1. Desk review on CSPaper.org. Spoiler: reviewers agree with the bot.

The Broader Lessons: Why This Matters

Peer Review Is (Usually) Not Stupid: Reviewers will spot module recycling and insufficient differentiation, especially when there’s overlap with your prior work. Don’t just rename; rethink.
Test Review Tools Are Useful: If CSPaper.org flags a similarity, there’s a good chance real reviewers will too. Use these bots as a sanity check, not a rubber stamp.
Theory > Tweaks: Top conferences crave fundamentally new ideas or theoretical insights, not just more efficient UNet variants.
Transparency in Experiments: Copy-pasting baseline results from another non-benchmark paper? Reviewers will notice, and it won’t end well.

Conclusion: “N to N+1” ≠ SOTA

In the end, UltraLightUNet ran afoul of the old “module soup” problem: in deep learning, recombining old blocks isn’t enough. Especially not at ICLR, where reviewers are allergic to “just another UNet.”

But if you’re building for real-world deployment, not just paper glory, you might still find value in UltraLightUNet's extreme efficiency tricks — just remember to cite EMCAD (and maybe call your model “UNetNextNext”).

Have your own peer-review horror story? Drop it in the comments or create a post. Bonus points if it includes kitchen sink analogies.

CSPaper: peer review sidekick

When Peer Review Goes Nuclear: The Saga of UltraLightUNet and the EMCAD Déjà Vu