RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining

Abstract

The outdoor vision systems are frequently contaminated by rain streaks andraindrops, which significantly degenerate the performance of visual tasks andmultimedia applications. The nature of videos exhibits redundant temporal cuesfor rain removal with higher stability. Traditional video deraining methodsheavily rely on optical flow estimation and kernel-based manners, which have alimited receptive field. Yet, transformer architectures, while enablinglong-term dependencies, bring about a significant increase in computationalcomplexity. Recently, the linear-complexity operator of the state space models(SSMs) has contrarily facilitated efficient long-term temporal modeling, whichis crucial for rain streaks and raindrops removal in videos. Unexpectedly, itsuni-dimensional sequential process on videos destroys the local correlationsacross the spatio-temporal dimension by distancing adjacent pixels. To addressthis, we present an improved SSMs-based video deraining network (RainMamba)with a novel Hilbert scanning mechanism to better capture sequence-level localinformation. We also introduce a difference-guided dynamic contrastive localitylearning strategy to enhance the patch-level self-similarity learning abilityof the proposed network. Extensive experiments on four synthesized videoderaining datasets and real-world rainy videos demonstrate the superiority ofour network in the removal of rain streaks and raindrops.

Quick Read (beta)

loading the full paper ...