Abstract
Machine learning force fields have emerged as promising tools for moleculardynamics (MD) simulations, potentially offering quantum-mechanical accuracywith the efficiency of classical MD. Inspired by foundational large languagemodels, recent years have seen considerable progress in developing foundationalatomistic models, sometimes referred to as universal force fields, designed tocover most elements in the periodic table. This Perspective adopts apractitioner's viewpoint to ask a critical question: Are these foundationalatomistic models reliable for one of their most compelling applications, inparticular simulating finite-temperature dynamics? Instead of a broadbenchmark, we use the canonical ferroelectric-paraelectric phase transition inPbTiO$_3$ as a focused case study to evaluate prominent foundational atomisticmodels. Our findings suggest a potential disconnect between static accuracy anddynamic reliability. While 0 K properties are often well-reproduced, weobserved that the models can struggle to consistently capture the correct phasetransition, sometimes exhibiting simulation instabilities. We believe thesechallenges may stem from inherent biases in training data and a limiteddescription of anharmonicity. These observed shortcomings, though demonstratedon a single system, appear to point to broader, systemic challenges that can beaddressed with targeted fine-tuning. This Perspective serves not to rankmodels, but to initiate a crucial discussion on the practical readiness offoundational atomistic models and to explore future directions for theirimprovement.