LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library

  • 2025-05-03 14:06:16
  • Tianhao Yu, Cai Yao, Zhuorui Sun, Feng Shi, Lin Zhang, Kangjie Lyu, Xuan Bai, Andong Liu, Xicheng Zhang, Jiali Zou, Wenshou Wang, Chris Lai, Kai Wang
  • 0

Abstract

In this study, we generate and maintain a database of 10 million virtuallipids through METiS's in-house de novo lipid generation algorithms and lipidvirtual screening techniques. These virtual lipids serve as a corpus forpre-training, lipid representation learning, and downstream task knowledgetransfer, culminating in state-of-the-art LNP property prediction performance.We propose LipidBERT, a BERT-like model pre-trained with the Masked LanguageModel (MLM) and various secondary tasks. Additionally, we compare theperformance of embeddings generated by LipidBERT and PhatGPT, our GPT-likelipid generation model, on downstream tasks. The proposed bilingual LipidBERTmodel operates in two languages: the language of ionizable lipid pre-training,using in-house dry-lab lipid structures, and the language of LNP fine-tuning,utilizing in-house LNP wet-lab data. This dual capability positions LipidBERTas a key AI-based filter for future screening tasks, including new versions ofMETiS de novo lipid libraries and, more importantly, candidates for in vivotesting for orgran-targeting LNPs. To the best of our knowledge, this is thefirst successful demonstration of the capability of a pre-trained languagemodel on virtual lipids and its effectiveness in downstream tasks using web-labdata. This work showcases the clever utilization of METiS's in-house de novolipid library as well as the power of dry-wet lab integration.

 

Quick Read (beta)

loading the full paper ...