hyperdoc2vec: Distributed Representations of Hypertext Documents

  • 2018-05-10 02:42:03
  • Jialong Han, Yan Song, Wayne Xin Zhao, Shuming Shi, Haisong Zhang
  • 24

Abstract

Hypertext documents, such as web pages and academic papers, are of greatimportance in delivering information in our daily life. Although beingeffective on plain documents, conventional text embedding methods suffer frominformation loss if directly adapted to hyper-documents. In this paper, wepropose a general embedding approach for hyper-documents, namely, hyperdoc2vec,along with four criteria characterizing necessary information thathyper-document embedding models should preserve. Systematic comparisons areconducted between hyperdoc2vec and several competitors on two tasks, i.e.,paper classification and citation recommendation, in the academic paper domain.Analyses and experiments both validate the superiority of hyperdoc2vec to othermodels w.r.t. the four criteria.

 

Introduction (beta)

None

 

Conclusion (beta)

None