TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems

Abstract

Federated Learning has emerged as a privacy-oriented alternative tocentralized Machine Learning, enabling collaborative model training withoutdirect data sharing. While extensively studied for neural networks, thesecurity and privacy implications of tree-based models remain underexplored.This work introduces TimberStrike, an optimization-based dataset reconstructionattack targeting horizontally federated tree-based models. Our attack, carriedout by a single client, exploits the discrete nature of decision trees by usingsplit values and decision paths to infer sensitive training data from otherclients. We evaluate TimberStrike on State-of-the-Art federated gradientboosting implementations across multiple frameworks, including Flower, NVFlare,and FedTree, demonstrating their vulnerability to privacy breaches. On apublicly available stroke prediction dataset, TimberStrike consistentlyreconstructs between 73.05% and 95.63% of the target dataset across allimplementations. We further analyze Differential Privacy, showing that while itpartially mitigates the attack, it also significantly degrades modelperformance. Our findings highlight the need for privacy-preserving mechanismsspecifically designed for tree-based Federated Learning systems, and we providepreliminary insights into their design.

Quick Read (beta)

loading the full paper ...