Abstract
The complementarity-determining regions of antibodies are loop structuresthat are key to their interactions with antigens, and of high importance to thedesign of novel biologics. Since the 1980s, categorizing the diversity of CDRstructures into canonical clusters has enabled the identification of keystructural motifs of antibodies. However, existing approaches have limitedcoverage and cannot be readily incorporated into protein foundation models.Here we introduce ImmunoGlobulin LOOp Tokenizer, Igloo, a multimodal antibodyloop tokenizer that encodes backbone dihedral angles and sequence. Igloo istrained using a contrastive learning objective to map loops with similarbackbone dihedral angles closer together in latent space. Igloo can efficientlyretrieve the closest matching loop structures from a structural antibodydatabase, outperforming existing methods on identifying similar H3 loops by5.9\%. Igloo assigns tokens to all loops, addressing the limited coverage issueof canonical clusters, while retaining the ability to recover canonical loopconformations. To demonstrate the versatility of Igloo tokens, we show thatthey can be incorporated into protein language models with IglooLM andIglooALM. On predicting binding affinity of heavy chain variants, IglooLMoutperforms the base protein language model on 8 out of 10 antibody-antigentargets. Additionally, it is on par with existing state-of-the-artsequence-based and multimodal protein language models, performing comparably tomodels with $7\times$ more parameters. IglooALM samples antibody loops whichare diverse in sequence and more consistent in structure than state-of-the-artantibody inverse folding models. Igloo demonstrates the benefit of introducingmultimodal tokens for antibody loops for encoding the diverse landscape ofantibody loops, improving protein foundation models, and for antibody CDRdesign.