SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials

Abstract

This paper describes our submission to Task 2 of SemEval-2024: SafeBiomedical Natural Language Inference for Clinical Trials. The Multi-evidenceNatural Language Inference for Clinical Trial Data (NLI4CT) consists of aTextual Entailment (TE) task focused on the evaluation of the consistency andfaithfulness of Natural Language Inference (NLI) models applied to ClinicalTrial Reports (CTR). We test 2 distinct approaches, one based on finetuning andensembling Masked Language Models and the other based on prompting LargeLanguage Models using templates, in particular, using Chain-Of-Thought andContrastive Chain-Of-Thought. Prompting Flan-T5-large in a 2-shot setting leadsto our best system that achieves 0.57 F1 score, 0.64 Faithfulness, and 0.56Consistency.

Quick Read (beta)

loading the full paper ...