Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

  • 2018-06-30 04:31:59
  • Marinka Zitnik, Francis Nguyen, Bo Wang, Jure Leskovec, Anna Goldenberg, Michael M. Hoffman
  • 30


New technologies have enabled the investigation of biology and human healthat an unprecedented scale and in multiple dimensions. These dimensions includemyriad properties describing genome, epigenome, transcriptome, microbiome,phenotype, and lifestyle. No single data type, however, can capture thecomplexity of all the factors relevant to understanding a phenomenon such as adisease. Integrative methods that combine data from multiple technologies havethus emerged as critical statistical and computational approaches. The keychallenge in developing such approaches is the identification of effectivemodels to provide a comprehensive and relevant systems view. An ideal methodcan answer a biological or medical question, identifying important features andpredicting outcomes, by harnessing heterogeneous data across several dimensionsof biological variation. In this Review, we describe the principles of dataintegration and discuss current methods and available implementations. Weprovide examples of successful data integration in biology and medicine.Finally, we discuss current challenges in biomedical integrative methods andour perspective on the future development of the field.


Introduction (beta)



Conclusion (beta)