Modernizing Historical Documents: a User Study

  • 2019-07-01 11:13:19
  • Miguel Domingo, Francisco Casacuberta
  • 0

Abstract

Accessibility to historical documents is mostly limited to scholars. This isdue to the language barrier inherent in human language and the linguisticproperties of these documents. Given a historical document, modernization aimsto generate a new version of it, written in the modern version of thedocument's language. Its goal is to tackle the language barrier, decreasing thecomprehension difficulty and making historical documents accessible to abroader audience. In this work, we proposed a new neural machine translationapproach that profits from modern documents to enrich its systems. We testedthis approach with both automatic and human evaluation, and conducted a userstudy. Results showed that modernization is successfully reaching its goal,although it still has room for improvement.

 

Quick Read (beta)

Modernizing Historical Documents: a User Study

Miguel Domingo Pattern Recognition and Human Language Technology Research Center
Universitat Politècnica de València - Camino de Vera s/n, 46022 Valencia, Spain
Francisco Casacuberta Pattern Recognition and Human Language Technology Research Center
Universitat Politècnica de València - Camino de Vera s/n, 46022 Valencia, Spain
Abstract

Accessibility to historical documents is mostly limited to scholars. This is due to the language barrier inherent in human language and the linguistic properties of these documents. Given a historical document, modernization aims to generate a new version of it, written in the modern version of the document’s language. Its goal is to tackle the language barrier, decreasing the comprehension difficulty and making historical documents accessible to a broader audience. In this work, we proposed a new neural machine translation approach that profits from modern documents to enrich its systems. We tested this approach with both automatic and human evaluation, and conducted a user study. Results showed that modernization is successfully reaching its goal, although it still has room for improvement.

1 Introduction

Historical documents are an important part of our cultural heritage. However, the nature of human language, which evolves with the passage of time, and the linguistic properties of these documents—due to the lack of a spelling convention, orthography changes depending on the time period and author—increase the difficulty of comprehending them. For this reason, historical documents are mostly accessible to scholars.

Modernization aims to tackle this language barrier and increase the accessibility of historical documents to a broader audience. With this purpose, it generates a new version of a historical document, written in the modern version of the document’s original language. creftype 1 shows an example of modernizing a document. In this case, part of the language structures and rhymes have been lost. However, the modern version is easier to read and comprehend by a broader audience.

O Romeo, Romeo! Wherefore art thou Romeo? Deny thy father and refuse thy name. Or, if thou wilt not, be but sworn my love, And I’ll no longer be a Capulet. With love’s light wings did I o’erperch these walls, For stony limits cannot hold love out, And what love can do, that dares love attempt. Therefore thy kinsmen are no stop to me. Oh, Romeo, Romeo, why do you have to be Romeo? Forget about your father and change your name. Or else, if you won’t change your name, just swear you love me and I’ll stop being a Capulet. I flew over these walls with the light wings of love. Stone walls can’t keep love out. Whatever a man in love can possibly do, his love will make him try to do it. Therefore your relatives are no obstacle.
Figure 1: Example of modernizing a historical document. The original text is composed of fragments from Romeo and Juliet by William Shakespeare. The modernized version was obtained from Crowther (2003).

While normalizing orthography to account for the lack of a spelling convention has been extensively research for years (Laing, 1993; Baron and Rayson, 2008; Porta et al., 2013; Hämäläinen et al., 2018), modernization of historical documents is a young research field. One of the first related works was a shared task for translating historical text to contemporary language (Tjong Kim Sang et al., 2017). The task was focused on normalizing the document’s spelling. However, they also approached document modernization using a set of rules. Domingo et al. (2017) proposed a modernization approach based on statistical machine translation (SMT). A neural machine translation (NMT) approach was proposed by Domingo and Casacuberta (2018). Finally, Sen et al. (2019) augmented the training data by extracting pairs of phrases and added them as new training sentences.

In this work, we followed a machine translation (MT) approach to tackle the modernization problem. Similarly to Domingo and Casacuberta (2018), we profited from modern documents to enrich the modernization systems. However, we applied a data selection technique to take better profit of these documents, selecting only the most relevant sentences for each task. We evaluated our approach both automatically and with the help of 4 scholars specialized in classic Spanish literature. Additionally, we conducted a user study with 42 people to assess whether or not modernization is able to decrease the difficulty of comprehending historical documents. Our main contributions are as follows:

  • We proposed a new NMT approach that successfully profits from modern documents to enrich its modernization systems.

  • We tested our proposal using 3 datasets from different languages and time periods.

  • We assessed the quality of our proposal using both automatic and human evaluation, conducted by 4 scholars specialized in classic Spanish literature.

  • First time, to the best of our knowledge, in which an NMT modernization approach behaves similarly or better than an SMT modernization approach.

  • We conducted a study with 42 users to assess whether modernization successfully decreases the difficulty of comprehending historical documents.

The rest of this document is structured as follows: creftype 2 presents the modernization approach. Then, in creftype 3, we describe the experimental framework of our work. After that, in creftype 4, we present and discuss the evaluation conducted in order to assess our approach. creftype 5 describes and presents the user study. Finally, in creftype 6, conclusions are drawn.

2 Modernization approaches

In this section, we present the state-of-the-art SMT modernization approach and our NMT-based proposal. Both approaches rely on MT which, given a source sentence 𝐱, aims at finding the most likely translation 𝐲^ (Brown et al., 1993):

𝐲^=argmax𝐲Pr(𝐲𝐱) (1)

2.1 SMT approach

For years, SMT has been the prevailing approach to compute creftype 1, using models that rely on a log-linear combination of different models (Och and Ney, 2002): namely, phrase-based alignment models, reordering models and language models; among others (Zens et al., 2002; Koehn et al., 2003).

In this approach, modernization is tackled as a conventional translation task: training an SMT system from a parallel corpora in which, for each sentence of the original document, its corresponding modernized version is available. For training this system, the language of the original document is considered as the source language, and its modernized version as the target language.

2.2 NMT approach

NMT models creftype 1 with a neural network which usually follows an encoder-decoder architecture, in which the source sentence is projected into a distributed representation at the encoding step. Then, at the decoding step, the decoder generates its most likely translation—word by word—using a beam search method (Sutskever et al., 2014).

The system’s input is a word sequence in the source language. An embedding matrix linearly projects each word to a fixed-size real-valued vector. These words embeddings are, then, fed into a bidirectional (Schuster and Paliwal, 1997) long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) network. As a result, a sequence of annotations is produced by concatenating the hidden states from the forward and backward layers. An attention mechanism (Bahdanau et al., 2015) allows the decoder to focus on parts of the input sequence, computing a weighted mean of annotated sequences. A soft alignment model computes these weights, weighting each annotation with the previous decoding state. Another LSTM network—conditioned by the representation computed by the attention model and the last word generated—is used for the decoder. Finally, a distribution over the target language vocabulary is computed by the deep output layer (Pascanu et al., 2013). The model is trained by applying stochastic gradient descent jointly to maximize the log-likelihood over a bilingual parallel corpus.

As the SMT approach (see creftype 2.1), our proposal tackles modernization as a conventional translation task but using NMT instead of SMT. Additionally, since NMT systems need larger quantities of training data, and a frequent problem when working with historical documents is the scarce availability of parallel training data (Bollmann and Søgaard, 2016), we created synthetic data in order to profit from modern documents to enrich the NMT models. First, we applied feature decay algorithm (Biçici and Yuret, 2015) to select those documents which are closer to the ones we have to modernize. After that, we followed a backtranslation approach (Sennrich et al., 2015) to create a parallel synthetic corpus. Backtranslation has become the norm when building state-of-the-art NMT systems—especially in resource-poor scenarios (Poncelas et al., 2018). Given a monolingual corpus in the target language and an MT system trained to translate from the target language to the source language, the synthetic data is generated by translating the monolingual corpus with the MT system—the resulting data is used as the source part of the corpus, and the monolingual data as the target part.

3 Experimental framework

Table 1: Corpora statistics. |S| stands for number of sentences, |T| for number of tokens and |V| for size of the vocabulary. Modern documents refers to the monolingual data used to create the synthetic data. M denotes millions and K thousands.
Dutch Bible El Quijote OE-ME Original Modernized Original Modernized Original Modernized Train |S| 35.2K 10K 2716 |T| 870.4K 862.4K 283.3K 283.2K 64.3K 69.6K |V| 53.8K 42.8K 31.7K 31.3K 13.3K 8.6K Validation |S| 2000 2000 500 |T| 56.4K 54.8K 53.2K 53.2K 12.2K 13.3K |V| 9.1K 7.8K 10.7K 10.6K 4.2K 3.2K Test |S| 5000 2000 500 |T| 145.8K 140.8K 41.8K 42.0K 11.9K 12.9K |V| 10.5K 9.0K 8.9K 9.0K 4.1K 3.2K Modern documents |S| 3.0M 2.0M 6.0M |T| 76.1M 74.1M 22.3M 22.2M 67.5M 71.6M |V| 1.7M 1.7M 210.1K 211.7K 290.2K 287.4K

In this section, we describe the MT systems, corpora and evaluation metrics from our experimental framework.

3.1 MT systems

SMT systems were trained with Moses (Koehn et al., 2007), following the standard procedure: we estimated a 5-gram language model—smoothed with the improved KneserNey method—using SRILM (Stolcke, 2002), and optimized the weights of the log-linear model with MERT (Och, 2003). SMT systems were used both for the SMT modernization approach and for generating synthetic data (see creftype 2).

We built NMT systems using OpenNMT-py (Klein et al., 2017). We used long short-term memory units (Gers et al., 2000), with all model dimensions set to 512. We trained the system using Adam (Kingma and Ba, 2014) with a fixed learning rate of 0.0002 and a batch size of 60. We applied label smoothing of 0.1 (Szegedy et al., 2015). At inference time, we used beam search with a beam size of 6. In order to reduce vocabulary, we applied joint byte pair encoding (BPE) (Sennrich et al., 2016) to all corpora, using 32,000 merge operations. NMT systems were trained using synthetic data and, then, were fine-tuned with the training data.

3.2 Corpora

Dutch Bible

(Tjong Kim Sang et al., 2017): A collection of different versions of the Dutch Bible. Among others, it contains a version from 1637—which we consider as the original version—and another from 1888—which we consider as the modern version (using 19th century Dutch as if it were modern Dutch).

El Quijote

(Domingo and Casacuberta, 2018): the well-known 17th century Spanish novel by Miguel de Cervantes, and its correspondent 21st century version.

OE-ME

(Sen et al., 2019): contains the original 11th century English text The Homilies of the Anglo-Saxon Church and a 19th century version—which we consider as modern English.

As reflected in creftype 1, the corpora sizes are small. Thus, the use of synthetic data to profit from modern documents and increase the training data (see creftype 2.2). As modern documents, we made use of the collection of Dutch books available at the Digitale Bibliotheek voor de Nederlandse letteren11 1 http://dbnl.nl/., for Dutch; and OpenSubtitles (Lison and Tiedemann, 2016)—a collection of movie subtitles in different languages—for Spanish and English.

3.3 Metrics

Modernization adopted evaluation metrics from MT. In order to assess our proposal, we made use of:

Translation Error Rate (TER)

(Snover et al., 2006): number of word edit operations (insertion, substitution, deletion and swapping), normalized by the number of words in the final translation.

BiLingual Evaluation Understudy (BLEU)

(Papineni et al., 2002): geometric average of the modified n-gram precision, multiplied by a brevity factor.

We used sacreBLEU (Post, 2018) in order to ensure consistent BLEU scores. Additionally, we applied approximate randomization tests (Riezler and Maxwell, 2005)—with 10,000 repetitions and using a p-value of 0.05—to determine whether two systems presented statistically significance.

Table 2: Experimental results. Baseline system corresponds to considering the original document as the modernized version. indicates statistically significance between the SMT/NMT approaches and the baseline. indicates statistically significance between the NMT and SMT approaches. [] indicates that the lowest the value the highest the quality. [] indicates that the highest the value the highest the quality.
Approach Dutch Bible El Quijote OE-ME TER [] BLEU [] TER [] BLEU [] TER [] BLEU [] Baseline 57.9 12.9 44.2 36.3 91.0 2.8 SMT 11.5 77.5 30.7 58.3 39.6 39.6 NMT 11.1 80.6 31.9 57.3 44.3 35.9

4 Evaluation

In order to assess the quality of our modernization approaches, we started by performing an automatic evaluation. Then, with the help of 4 scholars, we conducted a human evaluation.

4.1 Automatic evaluation

creftype 2 presents the results of the experimental session. All approaches significantly improved the modernization quality. Differences between the SMT and NMT approaches were only statistically significant for Dutch Bible. In that case, the NMT approach yielded the best results: an overall improvement of 46.8 points according to TER and 67.7 points according to BLEU; and an improvement of 0.4 and 2.9 points according to TER and BLEU respectively, with respect to the SMT approach.

To the best of our knowledge, this is the first time that an NMT modernization approach is able to achieve these kinds of results. Domingo and Casacuberta (2018) already tried to profit from modern documents to enrich the neural models. However, their approach only improved the modernization quality in some cases—and never enough to reach the quality of the SMT approach—while in others it lowered it significantly. Our approach was based on theirs, but we used a data selection technique to help us filtered the monolingual data in order to generate synthetic data more suitable for each task.

4.2 Human evaluation

The human evaluation was performed by 4 scholars specialized in classic Spanish literature. For this reason, it was conducted using El Quijote. We randomly selected 100 sentences, checking that modernizations were different to the original sentences. We showed each sentence together with its modernization—50 sentences modernized with the SMT approach and another 50 with the NMT approach— and asked the scholars to give a rating according to the quality of the following aspects: fluency, lexical meaning, syntax, semantic and modernization. To avoid any bias, we shuffled the sentences and did not give any detail to the evaluators about how modernizations had been produced. creftype 3 shows the results of the evaluation.

Table 3: Results of the human evaluation. Values correspond to the average score for all sentences of each approach. 1 is the lowest score and 5 is the highest.
Scholar SMT approach NMT approach Fluency Lexical meaning Syntax Semantic Modernization Fluency Lexical meaning Syntax Semantic Modernization Scholar1 5.0 4.3 4.3 4.6 3.9 4.8 4.0 4.0 4.1 4.0 Scholar2 2.1 1.9 2.0 2.1 2.0 2.0 1.9 1.9 1.9 1.9 Scholar3 3.2 3.1 2.9 2.9 3.1 3.3 3.2 2.9 3.0 3.1 Scholar4 4.5 3.9 4.6 4.3 4.0 3.8 3.5 3.7 3.7 3.5 Average 3.7 3.3 3.4 3.5 3.2 3.4 3.1 3.1 3.2 3.1

While the automatic evaluation (see creftype 4.1) did not show any significant differences between the SMT and NMT approaches, the human evaluators slightly preferred SMT over NMT. Scores vary considerably depending on the evaluator—scholar1 and scholar4 gave higher scores than scholar2 and scholar3. However, all evaluators agreed that fluency is the strongest point of both approaches. In general, scores are above the average, which seems to correlate with the automatic evaluation.

When we asked evaluators about their opinion, they commented that the main problems were related with punctuation and diacritical marks. They also mentioned that, sometimes, part of the sentence was lost in the modernization—a known issue related with NMT (Wu et al., 2016). Additionally, scholar1 commented that, overall, the quality of the modernization was acceptable. However, scholar2 commented that if they had to correct the mistakes, they would prefer to do the modernization from scratch.

5 User study

In order to assess whether modernization is able to decrease the difficulty of comprehending historical documents and, thus, making them accessible to a broader audience, we conducted a user study using El Quijote. 42 participants took part in this study. Considering that El Quijote is well-known in Spain, we asked participants about their familiarity with it. LABEL:fi:users shows some information about the user’s age and their familiarity with El Quijote.

{tikzpicture}\tikzstyle every node=[font=] \pgfkeys explode=0, color=blue!60, cyan!60, yellow!60, orange!60, red!60, blue!60!cyan!60, cyan!60!yellow!60, red!60!cyan!60, red!60!blue!60, orange!60!cyan!60, radius=3, pos=0,0, style=thick, before number=, after number=, text=label, sum=100, rotate=0, polar=false, square=false, cloud=false, scale font=false, \pgfkeysradius=1.3,color=black!10,black!20,black!30,black!40,black!50,black!60 \foreach\e in \foreach i̧n \pgfmathsetlength 0 \foreach p/\ein 2.4/ 20 years, 33.3/21–30 years, 26.2/31–40 years, 21.4/41–50 years, 9.5/51–60 years, 7.1/61–70 years \foreach p/[͡count=ıfrom 0] in 2.4/ 20 years, 33.3/21–30 years, 26.2/31–40 years, 21.4/41–50 years, 9.5/51–60 years, 7.1/61–70 years \pgfmathsetlength0.0pt \pgfmathaddtolengthp \pgfmathparse int(mod(ı,)) \foreach\e[count=ȷfrom 0] in \pgfmathparse int(mod(ı,)) \foreach[̧count=ȷfrom 0] in \pgfmathparse0.5*(0.0pt/*360+)+0.5*(0.0pt/*360+) \end{tikzpicture}\subcaption{Agedistribution.}\label{fi:age}%****␣modernization.tex␣Line␣325␣****\end{minipage}\begin{minipage}{0.5\textwidth}\begin{tikzpicture}\tikzstyle{everynode}=[font=\scriptsize]\pie[radius=1.3,color={black!10,black!20,black!30,black!40,black!50,black!60,black!70},rotate=0]{2.4/Unfamiliar,7.1/Knowwhatitisabout,14.3/Readfragmentsofanadaptation,19.0/Readanadaptation,14.3/Readfragmentsoftheoriginal,35.7/Readtheoriginal,7.1/Readamodernizedversion}\end{tikzpicture}\subcaption{Familiaritywith\emph{ElQuijote}.}\label{fi:quijote}\end{minipage}\caption{Informationaboutstudyparticipants.}\label{fi:users}\end{figure*}\parThemajorityoftheparticipantswerebetween20and50yearsold,buttherewasalsoolderandyoungerpeople.Withoneexception,allparticipantswerefamiliarwith\emph{ElQuijote}tosomeextent.Infact,35.7\%ofthemhadreadtheoriginalversionofthenovel.\parThestudyconsistedinseveralquestionsinwhichweshowedtwosentencestotheuser{\textemdash}theoriginalsentenceanditsmodernizedversion) – ++(\midangle:\theexplode) coordinate(O); \pgfmathparse +\theexplode \draw [line join=round, fill=\thecolor, ] (O) – ++(0.0pt/*360+:) arc (0.0pt/*360+:0.0pt/*360+:) – cycle; \pgfmathparse min(((0.0pt/*360+)-(0.0pt/*360+)-10)/110*(-0.3),0) \pgfmathparse(max(\temp,-0.5) + 0.8)* eitherbytheSMTortheNMTapproach){\textemdash}andaskedthemtoselectwhichsentencewaseasierforthemtoreadandcomprehend,ifbothofthemhadthesamedifficulty,oriftheythoughtthatbothsentencedidnothavethesamemeaning.TheselectedsentenceswerethesameusedinthehumanevaluationO) – ++ (\midangle:\radius) node[inner sep=0, =\midangle:]͡; see\cref{se:heval}).Inordertoavoidanybias,theorderinwhichsentencesappearedO) – ++(\midangle:\innerpos) node p; i.e., the original sentence and its modernized version) was randomized, as well as the use of the different approaches. creftype 2 shows an example of a question. Select the sentence which is easier for you to read and comprehend: Y, leuantandose, dexó de comer, y fue a quitar la cubierta de la primera imagen, que mostro ser la de San Iorge puesto a cauallo, con vna serpiente enroscada a los pies, y la lança atrauessada por la boca, con la fiereça que suele pintarse. Y levantándose, dejó de comer, y fue a quitar la cubierta de la primera imagen, que mostró ser la de San Jorge puesto a caballo, con una serpiente enroscada a los pies, y la lanza atravesada por la boca, con la fiereça que suele pintarse. Indifferent. Both sentences do not have the same meaning. Figure 2: Example of a question. creftype 4 presents the results of the study. Despite the users’ familiarity with El Quijote, modernization succeed in making it easier to comprehend. No matter the modernization approach, users selected the modernized version in the majority of the cases. In most of the remaining cases, users did not find any significant difference with respect to the original sentence. When comparing both approaches, we observe that the SMT approach yielded better results: Users selected 61.4% of their modernized versions, while they only selected a 50.9% of the sentences modernized by the NMT approach. Additionally, the SMT approach only introduced errors in 7.8% of the cases—the NMT introduced them in 20.3% of the cases—and its modernized versions were harder to comprehend only in 3.2% of the cases—versus a 6.4% of the cases for the NMT approach. Therefore, despite neither the automatic nor the human evaluation was able to find significant differences between both approaches, the user study showed that the SMT approach produced versions easier to read and comprehend more successfully than the NMT approach. Table 4: Results [%] of the user study. Original means that users understood better the original version. Modernized means that users understood better the modernized version. Indifferent means that users did not found any significant differences between the original and modernized versions. Not equal means that users feel that the meaning between both version differ. SMT NMT Original Modernized Indifferent Not equal Original Modernized Indifferent Not equal 3.2 61.4 27.6 7.8 6.4 50.9 22.3 20.3

5.1 Qualitative analysis

In this section, we show some behavioral examples of the modernization approach. The example from creftype 2 shows a successfully modernized sentence. Except for one small mistake (fiereça, which should be fiereza), orthography has been successfully modernized, making the sentence easier to read. (Note that, in this case, orthography is the only thing that needs to be modified in order to achieve a modern Spanish version.) Original version: Huuolo de conceder don Quixote, y assi lo hizo. Modernized version: Huéolo de conceder don Quijote, y así lo hizo. Figure 3: Example of modernization in which the modernized version is similar to the original version. creftype 3 shows an example in which there is not any significant difference between the modernized and the original version. Only three words have been modified—and one of them (huéolo) is not even a real word but a mistake introduced by the use of BPE. Despite this, there are people who found the modernized version easier to read; a great majority that found no difference between them; and a few people that either preferred the original version or considered that they did not have the same meaning. Original version: Ofreciosele el gallardo pastor, pidiole que se viniesse con el a sus tiendas; Modernized version: Se le rosó el gallardo pastor, pile dio que se viniese con él a sus tiendas; Figure 4: Example of modernization in which users preferred the original version over the modernized one. In creftype 4, we can see an example in which the original sentence is easier to understand than its modernized version. While users considered both versions to have the same meaning, the modernized one is harder to comprehend since the first half of the sentence does not make much sense. In fact, looking at the human evaluation, scholars considered the modernized version to be more or less fluent, but with a poor lexical meaning, syntax and semantic. Original version: Que me maten si los encantadores que me persiguen no quieren enredarme en ellas, y detener mi camino, como en vengança de la riguridad que con Altissidora he tenido. Modernized version: – Con mucho gusto? Figure 5: Example of modernization in which the modern version differs with respect to the original. Finally, creftype 5 shows an example in which the modernization went very bad. On the one hand, the modernized version is way shorter than the original version. On the other hand, its meaning has no relation with the original one.

6 Conclusions and future work

In this work, we proposed a new NMT modernization approach in order to tackle the language barrier inherent in historical documents. We tested this approach on three different historical datasets from three different languages and time periods, comparing it with the state-of-the-art SMT approach. An automatic evaluation showed that our approach improved the results achieved by the SMT approach on one dataset. Results were not statistically different than the SMT ones for the other two datasets. Additionally, we conducted a human evaluation for the Spanish dataset. This evaluation involved 4 scholars specialized in classical Spanish literature. Its results correlated with the automatic evaluation. Finally, we conducted a user study to evaluate whether modernization—both SMT and NMT approaches—was able to decrease the difficulty of comprehending historical documents and, thus, increase their accessibility to a broader audience. 42 volunteers, of different age and background, participated in this study. The study was conducted using the same Spanish subset than for the human evaluation. Results showed that modernization successfully decreased the comprehension difficulty. In most of the cases, users chose the modernized version as the easiest to read and comprehend. However, there is still room for improvement. Sometimes, the modernization introduced errors that made users feel that the meaning had been change. Other times, users did not find any significant difference between the original version and its modernization. When comparing the SMT and NMT approaches, the NMT approach made a bigger number of errors and the user chose its modernized version as the best option fewer times than with the SMT approach. As a future work, we would like to tackle the main problems pointed out during the human evaluation and the user study. Mainly, punctuation, diacritical marks, the introduction of non-existent words and loosing part of the sentence. We would also like to conduct a new human evaluation involving more scholars and more languages and datasets, and a new user study for different languages and datasets. Finally, we would like to apply the field of interactive machine translation to modernization, in order to assist scholars to achieve an error-free modernization.

Acknowledgments

The research leading to these results has received funding from the European Union through Programa Operativo del Fondo Europeo de Desarrollo Regional (FEDER) from Comunitat Valencia (2014–2020) under project Sistemas de frabricación inteligentes para la indústria 4.0 (grant agreement IDIFEDER/2018/025); and from Ministerio de Economía y Competitividad (MINECO) under project MISMIS-FAKEnHATE (grant agreement PGC2018-096212-B-C31). We gratefully acknowledge the support of NVIDIA Corporation with the donation of a GPU used for part of this research, and Andrés Trapiello and Ediciones Destino for granting us permission to use their book in our research. Additionally, we would like to thank all the volunteers that took part in the user study, and the scholars from Prolope that took part in the human evaluation.

References

  • Bahdanau et al. (2015) Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.
  • Baron and Rayson (2008) Baron, A. and Rayson, P. (2008). VARD2: A tool for dealing with spelling variation in historical corpora. Postgraduate conference in corpus linguistics.
  • Biçici and Yuret (2015) Biçici, E. and Yuret, D. (2015). Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE/ACM Transactions on Audio, Speech and Language Processing, 23(2):339–350.
  • Bollmann and Søgaard (2016) Bollmann, M. and Søgaard, A. (2016). Improving historical spelling normalization with bi-directional lstms and multi-task learning. In Proceedings of the International Conference on the Computational Linguistics, pages 131–139.
  • Brown et al. (1993) Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311.
  • Crowther (2003) Crowther, J. (2003). No Fear Shakespeare: Romeo and Juliet. SparkNotes.
  • Domingo and Casacuberta (2018) Domingo, M. and Casacuberta, F. (2018). A machine translation approach for modernizing historical documents using back translation. In Proceedings of the International Workshop on Spoken Language Translation, pages 39–47.
  • Domingo et al. (2017) Domingo, M., Chinea-Rios, M., and Casacuberta, F. (2017). Historical documents modernization. The Prague Bulletin of Mathematical Linguistics, 108:295–306.
  • Gers et al. (2000) Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10):2451–2471.
  • Hämäläinen et al. (2018) Hämäläinen, M., Säily, T., Rueter, J., Tiedemann, J., and Mäkelä, E. (2018). Normalizing early english letters to present-day english spelling. In Proceedings of the Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 87–96.
  • Hochreiter and Schmidhuber (1997) Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.
  • Kingma and Ba (2014) Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • Klein et al. (2017) Klein, G., Kim, Y., Deng, Y., Senellart, J., and Rush, A. M. (2017). OpenNMT: Open-Source Toolkit for Neural Machine Translation. In Proceedings of the Association for Computational Linguistics: System Demonstration, pages 67–72.
  • Koehn et al. (2007) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 177–180.
  • Koehn et al. (2003) Koehn, P., Och, F. J., and Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 48–54.
  • Laing (1993) Laing, M. (1993). The linguistic analysis of medieval vernacular texts: Two projects at edinburgh’. In Corpora across the Centuries: Proceedings of the First International Colloquium on English Diachronic Corpora, edited by M. Rissanen, M. Kytd, and S. Wright. St Catharine’s College Cambridge, volume 25427, pages 121–141.
  • Lison and Tiedemann (2016) Lison, P. and Tiedemann, J. (2016). Opensubtitles2016: Extracting large parallel corpora from movie and tv subtitles. In Proceedings of the International Conference on Language Resources Association, pages 923–929.
  • Och (2003) Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 160–167.
  • Och and Ney (2002) Och, F. J. and Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 295–302.
  • Papineni et al. (2002) Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 311–318.
  • Pascanu et al. (2013) Pascanu, R., Gulcehre, C., Cho, K., and Bengio, Y. (2013). How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026.
  • Poncelas et al. (2018) Poncelas, A., Shterionov, D., Way, A., Maillette de Buy Wenniger, G., and Passban, P. (2018). Investigation backtranslation in neural machine translation. In Proceedings of the Annual Conference of the European Association for Machine Translation, pages 249–258.
  • Porta et al. (2013) Porta, J., Sancho, J.-L., and Gómez, J. (2013). Edit transducers for spelling variation in old spanish. In Proceedings of the workshop on computational historical linguistics, pages 70–79.
  • Post (2018) Post, M. (2018). A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on Machine Translation, pages 186–191.
  • Riezler and Maxwell (2005) Riezler, S. and Maxwell, J. T. (2005). On some pitfalls in automatic evaluation and significance testing for mt. In Proceedings of the workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 57–64.
  • Schuster and Paliwal (1997) Schuster, M. and Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681.
  • Sen et al. (2019) Sen, S., Hasanuzzaman, M., Ekbal, A., Bhattacharyya, P., and Way, A. (2019). Take help from elder brother: Old to modern english nmt with phrase pair feedback. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing. In press.
  • Sennrich et al. (2015) Sennrich, R., Haddow, B., and Birch, A. (2015). Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709.
  • Sennrich et al. (2016) Sennrich, R., Haddow, B., and Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1715–1725.
  • Snover et al. (2006) Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas, pages 223–231.
  • Stolcke (2002) Stolcke, A. (2002). SRILM - an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, pages 257–286.
  • Sutskever et al. (2014) Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, volume 27, pages 3104–3112.
  • Szegedy et al. (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9.
  • Tjong Kim Sang et al. (2017) Tjong Kim Sang, E., Bollmann, M., Boschker, R., Casacuberta, F., Dietz, F., Dipper, S., Domingo, M., van der Goot, R., van Koppen, M., Ljubešić, N., Östling, R., Petran, F., Pettersson, E., Scherrer, Y., Schraagen, M., Sevens, L., Tiedemann, J., Vanallemeersch, T., and Zervanou, K. (2017). The CLIN27 shared task: Translating historical text to contemporary language for improving automatic linguistic annotation. Computational Linguistics in the Netherlands Journal, 7:53–64.
  • Wu et al. (2016) Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.
  • Zens et al. (2002) Zens, R., Och, F. J., and Ney, H. (2002). Phrase-based statistical machine translation. In Proceedings of the Annual German Conference on Advances in Artificial Intelligence, volume 2479, pages 18–32.