TY - JOUR
T1 - Applications of Machine Learning in Human Microbiome Studies
T2 - A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment
AU - Marcos-Zambrano, Laura Judith
AU - Karaduzovic-Hadziabdic, Kanita
AU - Loncar Turukalo, Tatjana
AU - Przymus, Piotr
AU - Trajkovik, Vladimir
AU - Aasmets, Oliver
AU - Berland, Magali
AU - Gruca, Aleksandra
AU - Hasic, Jasminka
AU - Hron, Karel
AU - Klammsteiner, Thomas
AU - Kolev, Mikhail
AU - Lahti, Leo
AU - Lopes, Marta B.
AU - Moreno, Victor
AU - Naskinova, Irina
AU - Org, Elin
AU - Paciência, Inês
AU - Papoutsoglou, Georgios
AU - Shigdel, Rajesh
AU - Stres, Blaz
AU - Vilne, Baiba
AU - Yousef, Malik
AU - Tsamardinos, Ioannis
AU - Zdravevski, Eftim
AU - Carrillo de Santa Pau, Enrique
AU - Claesson, Marcus J.
AU - Moreno-Indias, Isabel
AU - Truu, Jaak
AU - ML4Microbiome
N1 - Funding Information:
The authors are grateful to all COST Action CA18131 ?Statistical and machine learning techniques in human microbiome studies? members for their contribution in discussion about evaluation process of ML methods currently used in microbiome research during action workshops. Funding. This study was supported by COST Action CA18131 ?Statistical and machine learning techniques in human microbiome studies?. Estonian Research Council grant PRG548 (JT). Spanish State Research Agency Juan de la Cierva Grant IJC2019-042188-I (LM-Z). EO was founded and OA was supported by Estonian Research Council grant PUT 1371 and EMBO Installation grant 3573. AG was supported by Statutory Research project of the Department of Computer Networks and Systems.
Publisher Copyright:
© Copyright © 2021 Marcos-Zambrano, Karaduzovic-Hadziabdic, Loncar Turukalo, Przymus, Trajkovik, Aasmets, Berland, Gruca, Hasic, Hron, Klammsteiner, Kolev, Lahti, Lopes, Moreno, Naskinova, Org, Paciência, Papoutsoglou, Shigdel, Stres, Vilne, Yousef, Zdravevski, Tsamardinos, Carrillo de Santa Pau, Claesson, Moreno-Indias and Truu.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/2/19
Y1 - 2021/2/19
N2 - The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
AB - The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
KW - biomarker identification
KW - disease prediction
KW - feature selection
KW - machine learning
KW - microbiome
UR - http://www.scopus.com/inward/record.url?scp=85102368878&partnerID=8YFLogxK
U2 - 10.3389/fmicb.2021.634511
DO - 10.3389/fmicb.2021.634511
M3 - Review article
AN - SCOPUS:85102368878
SN - 1664-302X
VL - 12
JO - Frontiers in Microbiology
JF - Frontiers in Microbiology
M1 - 634511
ER -