Abstract
Owing to the diverse geographical environments, intricate landscapes, andhigh-density settlements, the automatic identification of urban villageboundaries using remote sensing images is a highly challenging task. This paperproposes a novel and efficient neural network model called UV-Mamba foraccurate boundary detection in high-resolution remote sensing images. UV-Mambamitigates the memory loss problem in long sequence modeling, which arises instate space model (SSM) with increasing image size, by incorporating deformableconvolutions (DCN). Its architecture utilizes an encoder-decoder framework,includes an encoder with four deformable state space augmentation (DSSA) blocksfor efficient multi-level semantic extraction and a decoder to integrate theextracted semantic information. We conducted experiments on the Beijing andXi'an datasets, and the results show that UV-Mamba achieves state-of-the-artperformance. Specifically, our model achieves 73.3% and 78.1% IoU on theBeijing and Xi'an datasets, respectively, representing improvements of 1.2% and3.4% IoU over the previous best model, while also being 6x faster in inferencespeed and 40x smaller in parameter count. Source code and pre-trained modelsare available in the supplementary material.