Defending Against Neural Fake News

Abstract

Recent progress in natural language generation has raised dual-use concerns.While applications like summarization and translation are positive, theunderlying technology also might enable adversaries to generate neural fakenews: targeted propaganda that closely mimics the style of real news. Modern computer security relies on careful threat modeling: identifyingpotential threats and vulnerabilities from an adversary's point of view, andexploring potential mitigations to these threats. Likewise, developing robustdefenses against neural fake news requires us first to carefully investigateand characterize the risks of these models. We thus present a model forcontrollable text generation called Grover. Given a headline like `Link FoundBetween Vaccines and Autism,' Grover can generate the rest of the article;humans find these generations to be more trustworthy than human-writtendisinformation. Developing robust verification techniques against generators like Grover iscritical. We find that best current discriminators can classify neural fakenews from real, human-written, news with 73% accuracy, assuming access to amoderate level of training data. Counterintuitively, the best defense againstGrover turns out to be Grover itself, with 92% accuracy, demonstrating theimportance of public release of strong generators. We investigate these resultsfurther, showing that exposure bias -- and sampling strategies that alleviateits effects -- both leave artifacts that similar discriminators can pick up on.We conclude by discussing ethical issues regarding the technology, and plan torelease Grover publicly, helping pave the way for better detection of neuralfake news.

Quick Read (beta)

loading the full paper ...