Abstract
Fact-checking is extensively studied in the context of misinformation anddisinformation, addressing objective inaccuracies. However, a softer form ofmisinformation involves responses that are factually correct but lack certainfeatures such as clarity and relevance. This challenge is prevalent in formalQuestion-Answer (QA) settings such as press conferences in finance, politics,sports, and other domains, where subjective answers can obscure transparency.Despite this, there is a lack of manually annotated datasets for subjectivefeatures across multiple dimensions. To address this gap, we introduceSubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs)QA sessions as the answers given by company representatives are often open tosubjective interpretations and scrutiny. The dataset includes 49,446annotations for long-form QA pairs across six features: Assertive, Cautious,Optimistic, Specific, Clear, and Relevant. These features are carefullyselected to encompass the key attributes that reflect the tone of the answersprovided during QA sessions across different domain. Our findings are that thebest-performing Pre-trained Language Model (PLM), RoBERTa-base, has similarweighted F1 scores to Llama-3-70b-Chat on features with lower subjectivity,such as Relevant and Clear, with a mean difference of 2.17% in their weightedF1 scores. The models perform significantly better on features with highersubjectivity, such as Specific and Assertive, with a mean difference of 10.01%in their weighted F1 scores. Furthermore, testing SubjECTive-QA'sgeneralizability using QAs from White House Press Briefings and Gaggles yieldsan average weighted F1 score of 65.97% using our best models for each feature,demonstrating broader applicability beyond the financial domain. SubjECTive-QAis publicly available under the CC BY 4.0 license