GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

  • 2025-07-17 17:42:29
  • Kyeongjin Ahn, Sungwon Han, Seungeon Lee, Donghyun Ahn, Hyoshin Kim, Jungwon Kim, Jihee Kim, Sangyoon Park, Meeyoung Cha
  • 0

Abstract

Socio-economic indicators like regional GDP, population, and educationlevels, are crucial to shaping policy decisions and fostering sustainabledevelopment. This research introduces GeoReg a regression model that integratesdiverse data sources, including satellite imagery and web-based geospatialinformation, to estimate these indicators even for data-scarce regions such asdeveloping countries. Our approach leverages the prior knowledge of largelanguage model (LLM) to address the scarcity of labeled data, with the LLMfunctioning as a data engineer by extracting informative features to enableeffective estimation in few-shot settings. Specifically, our model obtainscontextual relationships between data features and the target indicator,categorizing their correlations as positive, negative, mixed, or irrelevant.These features are then fed into the linear estimator with tailored weightconstraints for each category. To capture nonlinear patterns, the model alsoidentifies meaningful feature interactions and integrates them, along withnonlinear transformations. Experiments across three countries at differentstages of development demonstrate that our model outperforms baselines inestimating socio-economic indicators, even for low-income countries withlimited data availability.

 

Quick Read (beta)

loading the full paper ...