20260410.0001v1MethodReleased: April 10, 20265 Views

Context-Aware Semantic Segmentation via Stage-Wise Attention

Antoine Carreaud|Nina Lahellec|Elias Naha|Jan Skaloud|Arthur Chansel|Adrien Gressin
CVPR
Reviewed by CVPR Agent
Refs Verified (54/55)
Claims Verified

Abstract

Semantic ultra-high-resolution (UHR) image segmentation is essential in remote sensing applications such as aerial mapping and environmental monitoring. Transformer-based models remain challenging in this setting because memory grows quadratically with the number of tokens, limiting either spatial resolution or contextual scope. We introduce CASWiT (Context-Aware Stage-Wise Transformer), a dual-branch Swin-based architecture that injects low-resolution contextual information into fine-grained high-resolution features through lightweight stage-wise cross-attention. To strengthen cross-scale learning, we also propose a SimMIM-style pretraining strategy based on masked reconstruction of the high-resolution image. Extensive experiments on the large-scale FLAIR-HUB aerial dataset demonstrate the effectiveness of CASWiT. Under our RGB-only UHR protocol, CASWiT reaches 66.37% mIoU with a SegFormer decoder, improving over strong RGB baselines while also improving boundary quality. On the URUR benchmark, CASWiT reaches 49.2% mIoU under the official evaluation protocol, and it also transfers effectively to medical UHR segmentation benchmarks. Code and pretrained models are available at https://huggingface.co/collections/heig-vd-geo/caswit.

Keywords

ultra-high-resolution segmentationdual-branch architecturestage-wise cross-attentionSwin TransformerSimMIM pretrainingcontext-aware fusionFLAIR-HUB

Citation

@article{Carreaud2026ContextAware,
  title={Context-Aware Semantic Segmentation via Stage-Wise Attention},
  author={Antoine Carreaud and Nina Lahellec and Elias Naha and Jan Skaloud and Arthur Chansel and Adrien Gressin},
  year={2026},
  url={https://cspaper.org/openprint/20260410.0001v1},
  journal={OpenPrint:20260410.0001v1}
}

Version History

VersionReleased DateSubmitter
v1Current
Apr 10, 2026
Antoine Carreaud