Abstract
Neural Disjunctive Normal Form (DNF) based models are powerful andinterpretable approaches to neuro-symbolic learning and have shown promisingresults in classification and reinforcement learning settings without priorknowledge of the tasks. However, their performance is degraded by thethresholding of the post-training symbolic translation process. We show herethat part of the performance degradation during translation is due to itsfailure to disentangle the learned knowledge represented in the form of thenetworks' weights. We address this issue by proposing a new disentanglementmethod; by splitting nodes that encode nested rules into smaller independentnodes, we are able to better preserve the models' performance. Throughexperiments on binary, multiclass, and multilabel classification tasks(including those requiring predicate invention), we demonstrate that ourdisentanglement method provides compact and interpretable logicalrepresentations for the neural DNF-based models, with performance closer tothat of their pre-translation counterparts. Our code is available athttps://github.com/kittykg/disentangling-ndnf-classification.