New Amalgamation-Based Methods for Microbiome Compositional Data Analysis

Seminar Details
Thursday, October 27, 2022 - 12:00pm to 1:00pm

Speaker

Gen Li, PhD
John G. Searle Associate Professor of Biostatistics, Department of Biostatistics, School of Public Health

Location

Microbiome data are complex in nature, involving high dimensionality, compositionality, zero inflation, and taxonomic hierarchy. Compositional data reside in a simplex that does not admit the standard Euclidean geometry. Most existing methods rely on transformations that are inadequate or even inappropriate in modeling data with excessive zeros and taxonomic structure. In this talk, I will introduce novel amalgamation-based methods for microbiome compositional data analysis. In particular, we first develop a novel relative-shift regression framework that directly uses compositions as predictors. The new framework provides a paradigm shift for compositional regression and offers a superior biological interpretation. New equi-sparsity and taxonomy-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression. As a result, the framework can automatically identify clinically relevant microbes even if they are important at different taxonomic levels. We also develop a new dimension reduction paradigm for microbiome compositional data based on the amalgamation operation. Our approach aims to aggregate the compositions to a smaller number of principal compositions, guided by the available taxonomic structure, by minimizing a properly measured loss of information. We further demonstrate the efficacy of the new methods on simulations and several real microbiome studies.

Tool Link (Relative Shift): https://github.com/reagan0323/RelativeShift
Tool Link (PAA): https://github.com/LiYanStat/paaPack

This presentation will be held in 2036 Palmer Commons. There will also be a remote viewing option via Zoom.