Abstract
The concept of DNA storage was first suggested in 1959 by Richard Feynman whoshared his vision regarding nanotechnology in the talk "There is plenty of roomat the bottom". Later, towards the end of the 20-th century, the interest instorage solutions based on DNA molecules was increased as a result of the humangenome project which in turn led to a significant progress in sequencing andassembly methods. DNA storage enjoys major advantages over the well-establishedmagnetic and optical storage solutions. As opposed to magnetic solutions, DNAstorage does not require electrical supply to maintain data integrity and issuperior to other storage solutions in both density and durability. Given thetrends in cost decreases of DNA synthesis and sequencing, it is nowacknowledged that within the next 10-15 years DNA storage may become a highlycompetitive archiving technology and probably later the main such technology.With that said, the current implementations of DNA based storage systems arevery limited and are not fully optimized to address the unique pattern oferrors which characterize the synthesis and sequencing processes. In this work,we propose a robust, efficient and scalable solution to implement DNA-basedstorage systems. Our method deploys Deep Neural Networks (DNN) whichreconstruct a sequence of letters based on imperfect cluster of copiesgenerated by the synthesis and sequencing processes. A tailor-madeError-Correcting Code (ECC) is utilized to combat patterns of errors whichoccur during this process. Since our reconstruction method is adapted toimperfect clusters, our method overcomes the time bottleneck of the noisy DNAcopies clustering process by allowing the use of a rapid and scalablepseudo-clustering instead. Our architecture combines between convolutions andtransformers blocks and is trained using synthetic data modelled after realdata statistics.