Repurposing the scientific literature with vision-language models

  • 2025-04-28 01:52:00
  • Anton Alyakin, Jaden Stryker, Daniel Alexander Alber, Karl L. Sangwon, Jin Vivian Lee, Brandon Duderstadt, Akshay Save, David Kurland, Spencer Frome, Shrutika Singh, Jeff Zhang, Eunice Yang, Ki Yun Park, Cordelia Orillac, Aly A. Valliani, Sean Neifert, Albert Liu, Aneek Patel, Christopher Livia, Darryl Lau, Ilya Laufer, Peter A. Rozman, Eveline Teresa Hidalgo, Howard Riina, Rui Feng, Todd Hollon, Yindalon Aphinyanaphongs, John G. Golfinos, Laura Snyder, Eric Leuthardt, Douglas Kondziolka, Eric Karl Oermann
  • 0

Abstract

Leading vision-language models (VLMs) are trained on general Internetcontent, overlooking scientific journals' rich, domain-specific knowledge.Training on specialty-specific literature could yield high-performance,task-specific tools, enabling generative AI to match generalist models inspecialty publishing, educational, and clinical tasks. We created NeuroPubs, amultimodal dataset of 23,000 Neurosurgery Publications articles (134M words,78K image-caption pairs). Using NeuroPubs, VLMs generated publication-readygraphical abstracts (70% of 100 abstracts) and board-style questionsindistinguishable from human-written ones (54% of 89,587 questions). We usedthese questions to train CNS-Obsidian, a 34B-parameter VLM. In a blinded,randomized controlled trial, our model demonstrated non-inferiority to thenstate-of-the-art GPT-4o in neurosurgical differential diagnosis (clinicalutility, 40.62% upvotes vs. 57.89%, p=0.1150; accuracy, 59.38% vs. 65.79%,p=0.3797). Our pilot study demonstrates how training generative AI models onspecialty-specific journal content - without large-scale internet data -results in high-performance academic and clinical tools, enablingdomain-tailored AI across diverse fields.

 

Quick Read (beta)

loading the full paper ...