Measurement-driven neural-network training for integrated magnetic tunnel junction arrays

  • 2024-05-14 18:30:00
  • William A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. Hoskins
  • 0

Abstract

The increasing scale of neural networks needed to support more complexapplications has led to an increasing requirement for area- andenergy-efficient hardware. One route to meeting the budget for theseapplications is to circumvent the von Neumann bottleneck by performingcomputation in or near memory. An inevitability of transferring neural networksonto hardware is that non-idealities such as device-to-device variations orpoor device yield impact performance. Methods such as hardware-aware training,where substrate non-idealities are incorporated during network training, areone way to recover performance at the cost of solution generality. In thiswork, we demonstrate inference on hardware neural networks consisting of 20,000magnetic tunnel junction arrays integrated on a complementarymetal-oxide-semiconductor chips that closely resembles market-ready spintransfer-torque magnetoresistive random access memory technology. Using 36dies, each containing a crossbar array with its own non-idealities, we showthat even a small number of defects in physically mapped networks significantlydegrades the performance of networks trained without defects and show that, atthe cost of generality, hardware-aware training accounting for specific defectson each die can recover to comparable performance with ideal networks. We thendemonstrate a robust training method that extends hardware-aware training tostatistics-aware training, producing network weights that perform well on mostdefective dies regardless of their specific defect locations. When evaluated onthe 36 physical dies, statistics-aware trained solutions can achieve a meanmisclassification error on the MNIST dataset that differs from thesoftware-baseline by only 2 %. This statistics-aware training method could begeneralized to networks with many layers that are mapped to hardware suited forindustry-ready applications.

 

Quick Read (beta)

loading the full paper ...