Text to Image Generation: Leaving no Language Behind

Abstract

One of the latest applications of Artificial Intelligence (AI) is to generateimages from natural language descriptions. These generators are now becomingavailable and achieve impressive results that have been used for example in thefront cover of magazines. As the input to the generators is in the form of anatural language text, a question that arises immediately is how these modelsbehave when the input is written in different languages. In this paper weperform an initial exploration of how the performance of three populartext-to-image generators depends on the language. The results show that thereis a significant performance degradation when using languages other thanEnglish, especially for languages that are not widely used. This observationleads us to discuss different alternatives on how text-to-image generators canbe improved so that performance is consistent across different languages. Thisis fundamental to ensure that this new technology can be used by non-nativeEnglish speakers and to preserve linguistic diversity.

Quick Read (beta)

loading the full paper ...