Retrieving Classes of Causal Orders with Inconsistent Knowledge Bases

Abstract

Traditional causal discovery methods often rely on strong, untestableassumptions, which makes them unreliable in real applications. In this context,Large Language Models (LLMs) have emerged as a promising alternative forextracting causal knowledge from text-based metadata, which consolidates domainexpertise. However, LLMs tend to be unreliable and prone to hallucinations,necessitating strategies that account for their limitations. One effectivestrategy is to use a consistency measure to assess reliability. Additionally,most text metadata does not clearly distinguish direct causal relationshipsfrom indirect ones, further complicating the discovery of a causal DAG. As aresult, focusing on causal orders, rather than causal DAGs, emerges as a morepractical and robust approach. We present a new method to derive a class ofacyclic tournaments, which represent plausible causal orders, maximizing aconsistency score derived from an LLM. Our approach starts by calculatingpairwise consistency scores between variables, resulting in a semi-completepartially directed graph that consolidates these scores into an abstraction ofthe maximally consistent causal orders. Using this structure, we identifyoptimal acyclic tournaments, focusing on those that maximize consistency acrossall configurations. We subsequently show how both the abstraction and the classof causal orders can be used to estimate causal effects. We tested our methodon both well-established benchmarks, as well as, real-world datasets fromepidemiology and public health. Our results demonstrate the effectiveness ofour approach in recovering the correct causal order.

Quick Read (beta)

loading the full paper ...