Abstract
High-spatial-resolution hyperspectral images (HSI) are essential forapplications such as remote sensing and medical imaging, yet HSI sensorsinherently trade spatial detail for spectral richness. Fusinghigh-spatial-resolution multispectral images (HR-MSI) withlow-spatial-resolution hyperspectral images (LR-HSI) is a promising route torecover fine spatial structures without sacrificing spectral fidelity. Moststate-of-the-art methods for HSI-MSI fusion demand point spread function (PSF)calibration or ground truth high resolution HSI (HR-HSI), both of which areimpractical to obtain in real world settings. We present SpectraLift, a fullyself-supervised framework that fuses LR-HSI and HR-MSI inputs using only theMSI's Spectral Response Function (SRF). SpectraLift trains a lightweightper-pixel multi-layer perceptron (MLP) network using ($i$)~a syntheticlow-spatial-resolution multispectral image (LR-MSI) obtained by applying theSRF to the LR-HSI as input, ($ii$)~the LR-HSI as the output, and ($iii$)~an$\ell_1$ spectral reconstruction loss between the estimated and true LR-HSI asthe optimization objective. At inference, SpectraLift uses the trained networkto map the HR-MSI pixel-wise into a HR-HSI estimate. SpectraLift converges inminutes, is agnostic to spatial blur and resolution, and outperformsstate-of-the-art methods on PSNR, SAM, SSIM, and RMSE benchmarks.