Abstract
Patent data provides rich information about technical inventions, but doesnot disclose the ethnic origin of inventors. In this paper, I use supervisedlearning techniques to infer this information. To do so, I construct a datasetof 95'202 labeled names and train an artificial recurrent neural network withlong-short-term memory (LSTM) to predict ethnic origins based on names. Thetrained network achieves an overall performance of 91% across 17 ethnicorigins. I use this model to classify and investigate the ethnic origins of2.68 million inventors and provide novel descriptive evidence regarding theirethnic origin composition over time and across countries and technologicalfields. The global ethnic origin composition has become more diverse over thelast decades, which was mostly due to a relative increase of Asian origininventors. Furthermore, the prevalence of foreign-origin inventors isespecially high in the USA, but has also increased in other high-incomeeconomies. This increase was mainly driven by an inflow of non-westerninventors into emerging high-technology fields for the USA, but not for otherhigh-income countries.