Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Paper Copilot
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
CSPaper

CSPaper: review sidekick

Go to CCFDDL
Go to CSRankings
Go to OpenReview
  1. Home
  2. Peer Review in Computer Science: good, bad & broken
  3. Artificial intelligence & Machine Learning
  4. When SSIM Hits 7.0: A Peer Review Caution from ECAI'24

When SSIM Hits 7.0: A Peer Review Caution from ECAI'24

Scheduled Pinned Locked Moved Artificial intelligence & Machine Learning
ecai2024evaluation metricsotapeer reviewethics
1 Posts 1 Posters 86 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • SylviaS Offline
    SylviaS Offline
    Sylvia
    Super Users
    wrote last edited by
    #1

    In the world of computer vision, especially within style transfer and image generation research, evaluation metrics like SSIM, FID, and PSNR are household names. Most early-stage researchers know their significance and limitations. Yet, what happens when an SSIM score (by definition bounded between -1 and 1) suddenly clocks in at 6.913?

    Welcome to a strange episode that unfolded during ECAI 2024, where a paper titled "Progressive Artistic Aesthetic Enhancement For Chinese Ink Painting Style Transfer" made waves — not for its aesthetic beauty, but for the metrics behind it. This case offers a rare and valuable lesson on peer review rigor, the dangers of metric misuse, and the cultural dynamics surrounding academic integrity.

    Screenshot 2025-04-15 at 16.22.03.png


    The SSIM incident

    On the surface, the ECAI paper presents an image style transfer method for Chinese ink painting, showcasing results across ChipGAN, AnimeGAN, TwinGAN, and others. But buried in Table 1, one thing stood out: the SSIM value of the proposed method, PAAEST, reached 6.913, with other entries well above 1 too.

    For the uninitiated:

    SSIM (Structural Similarity Index Measure) is a metric that ranges from -1 to 1, where 1 indicates perfect similarity between two images. Values beyond that are simply... mathematically impossible.

    This anomaly didn't go unnoticed. A sharp-eyed peer reviewer or attendee at the conference (depending on the timeline) questioned it. What followed was an unusual and public unraveling.


    From inflated metrics to white-out corrections on poster

    Photos surfaced on XiaoHongShu, showing the author standing next to his ECAI poster — except something was odd. The original SSIM values on the poster were visibly whited out with correction fluid and rewritten to more "reasonable" values like 0.9. This quick fix raised more red flags than it resolved.

    Screenshot 2025-04-15 at 16.25.10.png

    Soon, further inconsistencies were discovered:

    • The updated values posted later on GitHub were again changed, this time to ~0.718.
    • The explanation given was that SSIM was mistakenly calculated between stylized images and the original style image (not the content image) violating evaluation conventions.
    • The same author had also published 10+ first-author papers within a single year across unrelated fields: AI safety, economics, vehicle design, even poetry generation.

    Screenshot 2025-04-15 at 16.26.30.png

    The broader peer review implications

    This story exposes cracks in the peer review system that many early-career researchers must confront:

    1. Review of evaluation metrics

    Too many papers get by with shallow metric reporting. Reviewers may skim tables without questioning feasibility. The SSIM=6.913 case is a cautionary tale: if you're reviewing and see anything off, speak up.

    2. Quantity over quality?

    An undergraduate publishing 10+ solo papers spanning NLP, CV, GANs, and econ within a year is not just impressive; it raises credibility concerns. True interdisciplinarity is rare and usually collaborative. Solo authorship across so many fields invites scrutiny.

    3. The Dangers of non-open research

    None of the author's papers were open-sourced. Claims of SOTA (state-of-the-art) went unverified. If the code isn't public, replication is impossible. This contradicts the spirit of scientific inquiry.


    Lessons

    This isn’t just gossip. It's a systemic warning.

    For early-stage researchers:

    • Know your metrics deeply. If you use SSIM, FID, or PSNR, understand their math, and use them properly.
    • Be transparent. If you make a mistake — own it. Quietly applying white-out on a poster isn’t a correction; it’s a cover-up.
    • Focus on depth over breadth. The temptation to churn out papers can lead to shortcuts that damage your long-term credibility.

    For reviewers:

    • Don't skim. Even something as small as a numeric inconsistency can signal much deeper issues.
    • Advocate for open code and reproducibility, especially when SOTA is claimed.
    • Peer review is not just gatekeeping. It’s a communal responsibility to uphold scientific integrity.

    Conclusion

    The SSIM 7.0 case might become a meme, but it points to a much more serious problem: the erosion of academic standards in the face of superficial productivity. Whether it's due to publication pressure, competitive internship applications, or blind ambition, these cases will only grow unless we build a stronger culture of accountability, mentorship, and technical literacy.

    Let this be a wake-up call. May our future metrics stay within bounds, and our science stay honest.

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    © 2025 CSPaper.org Sidekick of Peer Reviews
    Debating the highs and lows of peer review in computer science.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • World
    • Paper Copilot