Adaptive Patching for High-resolution Image Segmentation with Transformers

Abstract

Attention-based models are proliferating in the space of image analytics,including segmentation. The standard method of feeding images to transformerencoders is to divide the images into patches and then feed the patches to themodel as a linear sequence of tokens. For high-resolution images, e.g.microscopic pathology images, the quadratic compute and memory cost prohibitsthe use of an attention-based model, if we are to use smaller patch sizes thatare favorable in segmentation. The solution is to either use custom complexmulti-resolution models or approximate attention schemes. We take inspirationfrom Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching theimages, as a pre-processing step, based on the image details to reduce thenumber of patches being fed to the model, by orders of magnitude. This methodhas a negligible overhead, and works seamlessly with any attention-based model,i.e. it is a pre-processing step that can be adopted by any attention-basedmodel without friction. We demonstrate superior segmentation quality over SoTAsegmentation models for real-world pathology datasets while gaining a geomeanspeedup of $6.9\times$ for resolutions up to $64K^2$, on up to $2,048$ GPUs.

Quick Read (beta)

loading the full paper ...