Abstract
Despite rapid adoption of autoregressive large language models, smaller textencoders still play an important role in text understanding tasks that requirerich contextualized representations. Negation is an important semantic functionthat is still not properly captured by such methods, affecting many downstreamapplications relying on text embeddings. We propose a strategy to improvenegation robustness of text encoders, by distilling data from large languagemodels using diverse patterns of negation and hedging. We adopt a standardcontrastive learning strategy to finetune a strong BERT-based model, andobserve large improvement in negation understanding capabilities whilemaintaining competitive performance on general benchmarks. In addition, we alsoshow that our method can be adapted to LLMs, leading to improved performance onnegation benchmarks.