Reprogramming Language Models for Molecular Representation Learning

Abstract

Recent advancements in transfer learning have made it a promising approachfor domain adaptation via transfer of learned representations. This isespecially when relevant when alternate tasks have limited samples ofwell-defined and labeled data, which is common in the molecule data domain.This makes transfer learning an ideal approach to solve molecular learningtasks. While Adversarial reprogramming has proven to be a successful method torepurpose neural networks for alternate tasks, most works consider source andalternate tasks within the same domain. In this work, we propose a newalgorithm, Representation Reprogramming via Dictionary Learning (R2DL), foradversarially reprogramming pretrained language models for molecular learningtasks, motivated by leveraging learned representations in massive state of theart language models. The adversarial program learns a linear transformationbetween a dense source model input space (language data) and a sparse targetmodel input space (e.g., chemical and biological molecule data) using a k-SVDsolver to approximate a sparse representation of the encoded data, viadictionary learning. R2DL achieves the baseline established by state of the arttoxicity prediction models trained on domain-specific data and outperforms thebaseline in a limited training-data setting, thereby establishing avenues fordomain-agnostic transfer learning for tasks with molecule data.

Quick Read (beta)

loading the full paper ...