StRE: Self Attentive Edit Quality Prediction in Wikipedia

Abstract

Wikipedia can easily be justified as a behemoth, considering the sheer volumeof content that is added or removed every minute to its several projects. Thiscreates an immense scope, in the field of natural language processing towardsdeveloping automated tools for content moderation and review. In this paper wepropose Self Attentive Revision Encoder (StRE) which leverages orthographicsimilarity of lexical units toward predicting the quality of new edits. Incontrast to existing propositions which primarily employ features like pagereputation, editor activity or rule based heuristics, we utilize the textualcontent of the edits which, we believe contains superior signatures of theirquality. More specifically, we deploy deep encoders to generate representationsof the edits from its text content, which we then leverage to infer quality. Wefurther contribute a novel dataset containing 21M revisions across 32KWikipedia pages and demonstrate that StRE outperforms existing methods by asignificant margin at least 17% and at most 103%. Our pretrained model achievessuch result after retraining on a set as small as 20% of the edits in awikipage. This, to the best of our knowledge, is also the first attempt towardsemploying deep language models to the enormous domain of automated contentmoderation and review in Wikipedia.

Quick Read (beta)

loading the full paper ...