Abstract
Agent-based simulation platforms play a key role in enabling fast-to-runevolution experiments that can be precisely controlled and observed in detail.Availability of high-resolution snapshots of lineage ancestries from digitalexperiments, in particular, is key to investigations of evolvability andopen-ended evolution, as well as in providing a validation testbed forbioinformatics method development. Ongoing advances in AI/ML hardwareaccelerator devices, such as the 850,000-processor Cerebras Wafer-Scale Engine(WSE), are poised to broaden the scope of evolutionary questions that can beinvestigated in silico. However, constraints in memory capacity and localitycharacteristic of these systems introduce difficulties in exhaustively trackingphylogenies at runtime. To overcome these challenges, recent work on hereditarystratigraphy algorithms has developed space-efficient genetic markers tofacilitate fully decentralized estimation of relatedness among digitalorganisms. However, in existing work, compute time to reconstruct phylogeniesfrom these genetic markers has proven a limiting factor in achievinglarge-scale phyloanalyses. Here, we detail an improved trie-building algorithmdesigned to produce reconstructions equivalent to existing approaches. Formodestly-sized 10,000-tip trees, the proposed approach achieves a 300-foldspeedup versus existing state-of-the-art. Finally, using 1 billion genomedatasets drawn from WSE simulations encompassing 954 trillion replicationevents, we report a pair of large-scale phylogeny reconstruction trials,achieving end-to-end reconstruction times of 2.6 and 2.9 hours. Insubstantially improving reconstruction scaling and throughput, presented workestablishes a key foundation to enable powerful high-throughput phyloanalysistechniques in large-scale digital evolution experiments.