Deep Learning for Classical Japanese Literature

Abstract

Much of machine learning research focuses on producing models which performwell on benchmark tasks, in turn improving our understanding of the challengesassociated with those tasks. From the perspective of ML researchers, thecontent of the task itself is largely irrelevant, and thus there haveincreasingly been calls for benchmark tasks to more heavily focus on problemswhich are of social or cultural relevance. In this work, we introduceKuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), aswell as two larger, more challenging datasets, Kuzushiji-49 andKuzushiji-Kanji. Through these datasets, we wish to engage the machine learningcommunity into the world of classical Japanese literature. Dataset available athttps://github.com/rois-codh/kmnist

Quick Read (beta)

loading the full paper ...