LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling

Abstract

Machine learning algorithms, such as Support Vector Machine (SVM) and DeepNeural Network (DNN), have gained a lot of interests recently. When training amachine learning algorithm, randomly shuffle all the training data can improvethe testing accuracy and boost the convergence rate. Nevertheless, realizingtraining data random shuffling in a real system is not a straightforwardprocess due to the slow random accesses in hard disk drive (HDD). To avoidfrequent random disk access, the effect of random shuffling is often limited inexisting approaches. With the emerging non-volatile memory-based storagedevice, such as Intel Optane SSD, which provides fast random accesses, wepropose a lightweight implementation of random shuffling (LIRS) to randomlyshuffle the indexes of the entire training dataset, and the selected traininginstances are directly accessed from the storage and packed into batches.Experimental results show that LIRS can reduce the total training time of SVMand DNN by 49.9% and 43.5% on average, and improve the final testing accuracyon DNN by 1.01%.

Quick Read (beta)

loading the full paper ...