Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Engineering >> 2021, Volume 7, Issue 12 doi: 10.1016/j.eng.2020.05.028

Expanding the Scope of Multivariate Regression Approaches in Cross-Omics Research

a CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
b Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
c University of Chinese Academy of Sciences, Beijing 100049, China
d Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, USA

Received: 2019-10-08 Revised: 2020-03-14 Accepted: 2020-05-25 Available online: 2021-05-19

Next Previous

Abstract

Recent technological advancements and developments have led to a dramatic increase in the amount of high-dimensional data and thus have increased the demand for proper and efficient multivariate regression methods. Numerous traditional multivariate approaches such as principal component analysis have been used broadly in various research areas, including investment analysis, image identification, and population genetic structure analysis. However, these common approaches have the limitations of ignoring the correlations between responses and a low variable selection efficiency. Therefore, in this article, we introduce the reduced rank regression method and its extensions, sparse reduced rank regression and subspace assisted regression with row sparsity, which hold potential to meet the above demands and thus improve the interpretability of regression models. We conducted a simulation study to evaluate their performance and compared them with several other variable selection methods. For different application scenarios, we also provide selection suggestions based on predictive ability and variable selection accuracy. Finally, to demonstrate the practical value of these methods in the field of microbiome research, we applied our chosen method to real population-level microbiome data, the results of which validated our method. Our method extensions provide valuable guidelines for future omics research, especially with respect to multivariate regression, and could pave the way for novel discoveries in microbiome and related research fields.

SupplementaryMaterials

Figures

Fig. 1

Fig. 2

Fig. 3

Fig. 4

References

[ 1 ] Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science 2001;291(5507):1304–51. link1

[ 2 ] Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet 2010;11(1):31–46. link1

[ 3 ] Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature 2012;489(7415):220–30. link1

[ 4 ] Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLOS Comput Biol 2015;11(5):e1004226. link1

[ 5 ] Tsilimigras MCB, Fodor AA. Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol 2016;26(5):330–5. link1

[ 6 ] Izenman AJ. Modern multivariate statistical techniques: regression, classification, and manifold learning. New York: Springer-Verlag; 2008. link1

[ 7 ] Kharratzadeh M, Coates M. Sparse multivariate factor regression. In: Proceedings of the 2016 IEEE Statistical Signal Processing Workshop; 2016 Jun 26–29; Palma de Mallorca, Spain; 2016.

[ 8 ] Binder JJ. On the use of the multivariate regression model in event studies. J Account Res 1985;23(1):370. link1

[ 9 ] Kim KA, Jung IH, Park SH, Ahn YT, Huh CS, Kim DH. Comparative analysis of the gut microbiota in people with different levels of ginsenoside Rb1 degradation to compound K. PLoS ONE 2013;8(4):e62409. link1

[10] Peng Y, Li SN, Pei X, Hao K. The multivariate regression statistics strategy to investigate content-effect correlation of multiple components in traditional Chinese medicine based on a partial least squares method. Molecules 2018;23 (3):545. link1

[11] Yachida S, Mizutani S, Shiroma H, Shiba S, Nakajima T, Sakamoto T, et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat Med 2019;25 (6):968–76. link1

[12] Smith L. A tutorial on principal components analysis. Technical report. Dunedin: University of Otago; 2002 Feb. Report No.: OUCS-2002-12.

[13] Gleason PM, Boushey CJ, Harris JE, Zoellner J. Publishing nutrition research: a review of multivariate techniques—part 3: data reduction methods. J Acad Nutr Diet 2015;115(7):1072–82. link1

[14] Paliy O, Shankar V. Application of multivariate statistical techniques in microbial ecology. Mol Ecol 2016;25(5):1032–57. link1

[15] ter Braak CJF. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 1986;67(5):1167–79. link1

[16] Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta 1986;185:1–17. link1

[17] Chun H, Keles S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Series B Stat Methodol 2010;72(1):3–25. link1

[18] Bunea F, She Y,WegkampMH. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann Stat 2012;40(5):2359–88. link1

[19] Mukherjee A. Topics on reduced rank methods for multivariate regression [dissertation]. Ann Arbor: University of Michigan; 2013. link1

[20] D’Ambra L, Amenta P, Gallo M. Dimensionality reduction methods. Metodoloski Zveski 2005;2(1):115–23. link1

[21] Izenman AJ. Reduced-rank regression for the multivariate linear model. J Multivariate Analysis 1975;5(2):248–64. link1

[22] Hoffmann K, Schulze MB, Schienkiewitz A, Nothlings U, Boeing H. Application of a new statistical method to derive dietary patterns in nutritional epidemiology. Am J Epidemiol 2004;159(10):935–44. link1

[23] Cespedes EM, Hu FB. Dietary patterns: from nutritional epidemiologic analysis to national guidelines. Am J Clin Nutr 2015;101(5):899–900. link1

[24] Vounou M, Nichols TE, Montana G; Alzheimer’s Disease Neuroimaging Initiative. Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. NeuroImage 2010;53(3):1147–59. link1

[25] Vounou M, Janousova E, Wolz R, Stein JL, Thompson PM, Rueckert D, et al. Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease. NeuroImage 2012;60(1):700–16. link1

[26] Chen L, Huang JZ. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J Am Stat Assoc 2012;107(500):1533–45. link1

[27] Chen L, Huang JZ. Sparse reduced-rank regression with covariance estimation. Stat Comput 2016;26(1–2):461–70. link1

[28] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Series B Stat Methodol 2006;68(1):49–67. link1

[29] Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006;101 (476):1418–29. link1

[30] Ma Z, Sun T. Adaptive sparse reduced-rank regression. 2014. arxiv:1403.1922.

[31] Huang J, Breheny P, Ma S. A selective review of group selection in highdimensional models. Stat Sci 2012;27(4):481–99. link1

[32] Peng J, Zhu J, Bergamaschi A, Han W, Noh DY, Pollack JR, et al. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 2010;4(1):53–77. link1

[33] Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143(1):29–36. link1

[34] Falony G, Joossens M, Vieira-Silva S, Wang J, Darzi Y, Faust K, et al. Populationlevel analysis of gut microbiome variation. Science 2016;352(6285):560–4. link1

[35] Wan Y, Wang F, Yuan J, Li J, Jiang D, Zhang J, et al. Effects of dietary fat on gut microbiota and faecal metabolites, and their relationship with cardiometabolic risk factors: a 6-month randomised controlled-feeding trial. Gut 2019;68 (8):1417–29. link1

[36] Sanna S, van Zuydam NR, Mahajan A, Kurilshikov A, Vila AV, Võsa U, et al. Causal relationships among the gut microbiome, short-chain fatty acids and metabolic diseases. Nat Genet 2019;51(4):600–5. link1

[37] Maier L, Pruteanu M, Kuhn M, Zeller G, Telzerow A, Anderson EE, et al. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 2018;555(7698):623–8. link1

[38] Segata N, Boernigen D, Tickle TL, Morgan XC, Garrett WS, Huttenhower C. Computational metaomics for microbial community studies. Mol Syst Biol 2013;9(1):666. link1

Related Research