Abstract
High-throughput phenotyping automates the mapping of patient signs tostandardized ontology concepts and is essential for precision medicine. Thisstudy evaluates the automation of phenotyping of clinical summaries from theOnline Mendelian Inheritance in Man (OMIM) database using large languagemodels. Due to their rich phenotype data, these summaries can be surrogates forphysician notes. We conduct a performance comparison of GPT-4 andGPT-3.5-Turbo. Our results indicate that GPT-4 surpasses GPT-3.5-Turbo inidentifying, categorizing, and normalizing signs, achieving concordance withmanual annotators comparable to inter-rater agreement. Despite some limitationsin sign normalization, the extensive pre-training of GPT-4 results in highperformance and generalizability across several phenotyping tasks whileobviating the need for manually annotated training data. Large language modelsare expected to be the dominant method for automating high-throughputphenotyping of clinical text.