Abstract
While previous AI Scientist systems can generate novel findings, they oftenlack the focus to produce scientifically valuable contributions that addresspressing human-defined challenges. We introduce DeepScientist, a systemdesigned to overcome this by conducting goal-oriented, fully autonomousscientific discovery over month-long timelines. It formalizes discovery as aBayesian Optimization problem, operationalized through a hierarchicalevaluation process consisting of "hypothesize, verify, and analyze". Leveraginga cumulative Findings Memory, this loop intelligently balances the explorationof novel hypotheses with exploitation, selectively promoting the most promisingfindings to higher-fidelity levels of validation. Consuming over 20,000 GPUhours, the system generated about 5,000 unique scientific ideas andexperimentally validated approximately 1100 of them, ultimately surpassinghuman-designed state-of-the-art (SOTA) methods on three frontier AI tasks by183.7\%, 1.9\%, and 7.9\%. This work provides the first large-scale evidence ofan AI achieving discoveries that progressively surpass human SOTA on scientifictasks, producing valuable findings that genuinely push the frontier ofscientific discovery. To facilitate further research into this process, we willopen-source all experimental logs and system code athttps://github.com/ResearAI/DeepScientist/.