Abstract
Sequence-to-sequence models have recently gained the state of the artperformance in summarization. However, not too many large-scale high-qualitydatasets are available and almost all the available ones are mainly newsarticles with specific writing style. Moreover, abstractive human-style systemsinvolving description of the content at a deeper level require data with higherlevels of abstraction. In this paper, we present WikiHow, a dataset of morethan 230,000 article and summary pairs extracted and constructed from an onlineknowledge base written by different human authors. The articles span a widerange of topics and therefore represent high diversity styles. We evaluate theperformance of the existing methods on WikiHow to present its challenges andset some baselines to further improve it.