Towards the Latent Transcriptome

  • 2018-12-10 17:46:47
  • Assya Trofimov, Francis Dutil, Claude Perreault, Sebastien Lemieux, Yoshua Bengio, Joseph Paul Cohen
  • 0

Abstract

In this work we propose a method to compute continuous embeddings for kmersfrom raw RNA-seq data, without the need for alignment to a reference genome.The approach uses an RNN to transform kmers of the RNA-seq reads into a 2dimensional representation that is used to predict abundance of each kmer. Wereport that our model captures information of both DNA sequence similarity aswell as DNA sequence abundance in the embedding latent space, that we call theLatent Transcriptome. We confirm the quality of these vectors by comparing themto known gene sub-structures and report that the latent space recovers exoninformation from raw RNA-Seq data from acute myeloid leukemia patients.Furthermore we show that this latent space allows the detection of genomicabnormalities such as translocations as well as patient-specific mutations,making this representation space both useful for visualization as well asanalysis.

 

Quick Read (beta)

loading the full paper ...