Abstract
Generating novel and functional protein sequences is critical to a wide rangeof applications in biology. Recent advancements in conditional diffusion modelshave shown impressive empirical performance in protein generation tasks.However, reliable generations of protein remain an open research question in denovo protein design, especially when it comes to conditional diffusion models.Considering the biological function of a protein is determined by multi-levelstructures, we propose a novel multi-level conditional diffusion model thatintegrates both sequence-based and structure-based information for efficientend-to-end protein design guided by specified functions. By generatingrepresentations at different levels simultaneously, our framework caneffectively model the inherent hierarchical relations between different levels,resulting in an informative and discriminative representation of the generatedprotein. We also propose a Protein-MMD, a new reliable evaluation metric, toevaluate the quality of generated protein with conditional diffusion models.Our new metric is able to capture both distributional and functionalsimilarities between real and generated protein sequences while ensuringconditional consistency. We experiment with the benchmark datasets, and theresults on conditional protein generation tasks demonstrate the efficacy of theproposed generation framework and evaluation metric.