Abstract
Scientific knowledge is growing rapidly, making it difficult to trackprogress and high-level conceptual links across broad disciplines. While toolslike citation networks and search engines help retrieve related papers, theylack the abstraction needed to capture the needed to represent the density andstructure of activity across subfields. We motivate SCIENCE HIERARCHOGRAPHY, the goal of organizing scientificliterature into a high-quality hierarchical structure that spans multiplelevels of abstraction -- from broad domains to specific studies. Such arepresentation can provide insights into which fields are well-explored andwhich are under-explored. To achieve this goal, we develop a hybrid approachthat combines efficient embedding-based clustering with LLM-based prompting,striking a balance between scalability and semantic precision. Compared toLLM-heavy methods like iterative tree construction, our approach achievessuperior quality-speed trade-offs. Our hierarchies capture different dimensionsof research contributions, reflecting the interdisciplinary and multifacetednature of modern science. We evaluate its utility by measuring how effectivelyan LLM-based agent can navigate the hierarchy to locate target papers. Resultsshow that our method improves interpretability and offers an alternativepathway for exploring scientific literature beyond traditional search methods.Code, data and demo are available:https://github.com/JHU-CLSP/science-hierarchography