LSM-2: Learning from Incomplete Wearable Sensor Data

Abstract

Foundation models, a cornerstone of recent advancements in machine learning,have predominantly thrived on complete and well-structured data. Wearablesensor data frequently suffers from significant missingness, posing asubstantial challenge for self-supervised learning (SSL) models that typicallyassume complete data inputs. This paper introduces the second generation ofLarge Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novelSSL approach that learns robust representations directly from incomplete datawithout requiring explicit imputation. AIM's core novelty lies in its use oflearnable mask tokens to model both existing ("inherited") and artificiallyintroduced missingness, enabling it to robustly handle fragmented real-worlddata during inference. Pre-trained on an extensive dataset of 40M hours ofday-long multimodal sensor data, our LSM-2 with AIM achieves the bestperformance across a diverse range of tasks, including classification,regression and generative modeling. Furthermore, LSM-2 with AIM exhibitssuperior scaling performance, and critically, maintains high performance evenunder targeted missingness scenarios, reflecting clinically coherent patterns,such as the diagnostic value of nighttime biosignals for hypertensionprediction. This makes AIM a more reliable choice for real-world wearable dataapplications.

Quick Read (beta)

loading the full paper ...