Engineering and Architecture
Oscar Yanes Torrado
Roger Guimerà Manrique
Technologies for Nanosystems, Bioengineering and Energy
A deep learning approach for identifying metabolites by mass spectrometry-based metabolomics
The project is frontier science and will overcome one of the main existing barriers in the field of metabolomics: the structural identification of metabolites and the characterization of complete metabolomes. The identification of metabolites in eukaryotic and prokaryotic (e.g., microbiota) organisms, and environmental samples is the next frontier in metabolomics research. Similar to the impact of protein search algorithms for the progress of proteomics in the 90s, the approach proposed in this project will lay the basis for characterizing large numbers of metabolites lacking annotated tandem MS spectra or without molecular structures listed in chemical databases.
The identification of metabolites from mass spectrometry (MS) analyses requires annotation of MS1 and MS2 data, including reduction of redundant signals in MS1 (mostly due to in-source phenomena such as cation adduction) and matching observed MSn (n ≥ 2) spectra to experimental spectra available in reference MS2 databases (e.g. MassBank, NIST). However, a complication of this strategy is the poor coverage of primary and secondary metabolites (i.e., natural products) in standard reference databases. Approximately, only 10% of known small molecules in databases have experimental MS2 data from pure standards. In addition, many metabolites are unknown so they are extremely difficult to characterize by the fact that both chemical structures of metabolites and annotated tandem MS spectra are unknown.
This project aims to develop a novel integrated computational workflow for non-targeted mass spectrometry-based metabolomics, including the annotation of MS1 and MS2 (or MSn) data. The computational workflow is based on a recent patent application (P202030061) of our group and a deep learning approach for MS2 annotation. The deep learning approach will be based on using separate variational autoencoders for representing MS/MS spectral data and for representing the chemical structure of known metabolites. The integrated workflow will be made publicly available for the wider scientific community to utilize in the form of user-friendly software.
Ethics: This project does not involve ethical aspects.
Workplace location: Campus Sescelades, Tarragona
37.5 hours a week
27 April 2021
|This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 945413|