Abstract
In remote sensing, most segmentation networks adopt the UNet architecture,often incorporating modules such as Transformers or Mamba to enhanceglobal-local feature interactions within decoder stages. However, theseenhancements typically focus on intra-scale relationships and neglect theglobal contextual dependencies across multiple resolutions. To address thislimitation, we introduce the Terrace Convolutional Decoder Network (TNet), asimple yet effective architecture that leverages only convolution and additionoperations to progressively integrate low-resolution features (rich in globalcontext) into higher-resolution features (rich in local details) acrossdecoding stages. This progressive fusion enables the model to learnspatially-aware convolutional kernels that naturally blend global and localinformation in a stage-wise manner. We implement TNet with a ResNet-18 encoder(TNet-R) and evaluate it on three benchmark datasets. TNet-R achievescompetitive performance with a mean Intersection-over-Union (mIoU) of 85.35\%on ISPRS Vaihingen, 87.05\% on ISPRS Potsdam, and 52.19\% on LoveDA, whilemaintaining high computational efficiency. Code is publicly available.