Abstract
Deep neural networks~(DNNs) have proven powerful for denoising, but they areultimately of limited use in high-noise settings, such as for cryogenicelectron microscopy~(cryo-EM) projection images. In this setting, however,datasets contain a large number of projections of the same molecule, each takenfrom a different viewing direction. This redundancy of information is useful intraditional denoising techniques known as class averaging methods, where imagesare clustered, aligned, and then averaged to reduce the noise level. We presenta neural network architecture based on transformers that extends these classaveraging methods by simultaneously clustering, aligning, and denoising cryo-EMimages. Results on synthetic data show accurate denoising performance usingthis architecture, reducing the relative mean squared error (MSE) single-imageDNNs by $45\%$ at a signal-to-noise (SNR) of $0.03$.