High Fidelity Vector Space Models of Structured Data

Abstract

Machine learning systems regularly deal with structured data in real-worldapplications. Unfortunately, such data has been difficult to faithfullyrepresent in a way that most machine learning techniques would expect, i.e. asa real-valued vector of a fixed, pre-specified size. In this work, we introducea novel approach that compiles structured data into a satisfiability problemwhich has in its set of solutions at least (and often only) the input data. Thesatisfiability problem is constructed from constraints which are generatedautomatically a priori from a given signature, thus trivially allowing for abag-of-words-esque vector representation of the input to be constructed. Themethod is demonstrated in two areas, automated reasoning and natural languageprocessing, where it is shown to be near-perfect in producing vectorrepresentations of natural-language sentences and first-order logic clausesthat can be translated back to their original, structured input forms.

Quick Read (beta)

loading the full paper ...