Abstract
Despite the impressive performance of large language models (LLMs) in generaldomains, they often underperform in specialized domains. Existing approachestypically rely on data synthesis methods and yield promising results by usingunlabeled data to capture domain-specific features. However, these methodseither incur high computational costs or suffer from performance limitations,while also demonstrating insufficient generalization across different tasks. Toaddress these challenges, we propose AQuilt, a framework for constructinginstruction-tuning data for any specialized domains from correspondingunlabeled data, including Answer, Question, Unlabeled data, Inspection, Logic,and Task type. By incorporating logic and inspection, we encourage reasoningprocesses and self-inspection to enhance model performance. Moreover,customizable task instructions enable high-quality data generation for anytask. As a result, we construct a dataset of 703k examples to train a powerfuldata synthesis model. Experiments show that AQuilt is comparable to DeepSeek-V3while utilizing just 17% of the production cost. Further analysis demonstratesthat our generated data exhibits higher relevance to downstream tasks. Sourcecode, models, and scripts are available at https://github.com/Krueske/AQuilt.