ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

Abstract

This paper presents a partial reproduction of Generating Fact CheckingExplanations by Anatanasova et al (2020) as part of the ReproHum element of theReproNLP shared task to reproduce the findings of NLP research regarding humanevaluation. This shared task aims to investigate the extent to which NLP as afield is becoming more or less reproducible over time. Following theinstructions provided by the task organisers and the original authors, wecollect relative rankings of 3 fact-checking explanations (comprising a goldstandard and the outputs of 2 models) for 40 inputs on the criteria ofCoverage. The results of our reproduction and reanalysis of the original work'sraw results lend support to the original findings, with similar patterns seenbetween the original work and our reproduction. Whilst we observe slightvariation from the original results, our findings support the main conclusionsdrawn by the original authors pertaining to the efficacy of their proposedmodels.

Quick Read (beta)

loading the full paper ...