
扩大多元回归方法在跨组学研究中的范围
Xiaoxi Hu, Yue Ma, Yakun Xu, Peiyao Zhao, Jun Wang
工程(英文) ›› 2021, Vol. 7 ›› Issue (12) : 1725-1731.
扩大多元回归方法在跨组学研究中的范围
Expanding the Scope of Multivariate Regression Approaches in Cross-Omics Research
近年来科技的进步和发展使得高维数据急剧增加,研究人员对合适且有效的多元回归方法的需求也随之增长。许多传统的多元分析方法如主成分分析等已广泛应用于投资分析、图像识别和群体遗传结构分析等研究领域。然而,这些常见的方法存在其局限性,即忽略了响应之间的相关性和变量选择效率低的问题。因此,本文引入了降秩回归方法及其扩展形式——稀疏降秩回归和行稀疏的子空间辅助回归,这些方法有望满足上述需求,从而提高回归模型的可解释性。我们通过开展仿真研究来评估它们的效果,并将它们与其他几种变量选择方法进行比较。对于不同的应用场景,我们也提供了基于预测能力和变量选择精度的选择建议。最后,为了证明这些方法在微生物组研究领域的实用价值,我们将所选择的方法应用于实际种群水平的微生物组数据,结果验证了我们方法的有效性。该方法的扩展形式为未来的组学研究特别是多元回归研究提供了有价值的指导,并为微生物组学及其相关研究领域的新发现奠定了基础。
Recent technological advancements and developments have led to a dramatic increase in the amount of high-dimensional data and thus have increased the demand for proper and efficient multivariate regression methods. Numerous traditional multivariate approaches such as principal component analysis have been used broadly in various research areas, including investment analysis, image identification, and population genetic structure analysis. However, these common approaches have the limitations of ignoring the correlations between responses and a low variable selection efficiency. Therefore, in this article, we introduce the reduced rank regression method and its extensions, sparse reduced rank regression and subspace assisted regression with row sparsity, which hold potential to meet the above demands and thus improve the interpretability of regression models. We conducted a simulation study to evaluate their performance and compared them with several other variable selection methods. For different application scenarios, we also provide selection suggestions based on predictive ability and variable selection accuracy. Finally, to demonstrate the practical value of these methods in the field of microbiome research, we applied our chosen method to real population-level microbiome data, the results of which validated our method. Our method extensions provide valuable guidelines for future omics research, especially with respect to multivariate regression, and could pave the way for novel discoveries in microbiome and related research fields.
多元回归方法 / 降秩回归 / 稀疏性 / 降维 / 变量选择
Multivariate regression methods / Reduced rank regression / Sparsity / Dimensionality reduction / Variable selection
[1] |
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science 2001;291(5507):1304–51.
|
[2] |
Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet 2010;11(1):31–46.
|
[3] |
Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature 2012;489(7415):220–30.
|
[4] |
Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLOS Comput Biol 2015;11(5):e1004226.
|
[5] |
Tsilimigras MCB, Fodor AA. Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol 2016;26(5):330–5.
|
[6] |
Izenman AJ. Modern multivariate statistical techniques: regression, classification, and manifold learning. New York: Springer-Verlag; 2008.
|
[7] |
Kharratzadeh M, Coates M. Sparse multivariate factor regression. In: Proceedings of the 2016 IEEE Statistical Signal Processing Workshop; 2016 Jun 26–29; Palma de Mallorca, Spain; 2016.
|
[8] |
Binder JJ. On the use of the multivariate regression model in event studies. J Account Res 1985;23(1):370.
|
[9] |
Kim KA, Jung IH, Park SH, Ahn YT, Huh CS, Kim DH. Comparative analysis of the gut microbiota in people with different levels of ginsenoside Rb1 degradation to compound K. PLoS ONE 2013;8(4):e62409.
|
[10] |
Peng Y, Li SN, Pei X, Hao K. The multivariate regression statistics strategy to investigate content-effect correlation of multiple components in traditional Chinese medicine based on a partial least squares method. Molecules 2018;23 (3):545.
|
[11] |
Yachida S, Mizutani S, Shiroma H, Shiba S, Nakajima T, Sakamoto T, et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat Med 2019;25 (6):968–76.
|
[12] |
Smith L. A tutorial on principal components analysis. Technical report. Dunedin: University of Otago; 2002 Feb. Report No.: OUCS-2002-12.
|
[13] |
Gleason PM, Boushey CJ, Harris JE, Zoellner J. Publishing nutrition research: a review of multivariate techniques—part 3: data reduction methods. J Acad Nutr Diet 2015;115(7):1072–82.
|
[14] |
Paliy O, Shankar V. Application of multivariate statistical techniques in microbial ecology. Mol Ecol 2016;25(5):1032–57.
|
[15] |
ter Braak CJF. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 1986;67(5):1167–79.
|
[16] |
Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta 1986;185:1–17.
|
[17] |
Chun H, Keles S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Series B Stat Methodol 2010;72(1):3–25.
|
[18] |
Bunea F, She Y,WegkampMH. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann Stat 2012;40(5):2359–88.
|
[19] |
Mukherjee A. Topics on reduced rank methods for multivariate regression [dissertation]. Ann Arbor: University of Michigan; 2013.
|
[20] |
D’Ambra L, Amenta P, Gallo M. Dimensionality reduction methods. Metodoloski Zveski 2005;2(1):115–23.
|
[21] |
Izenman AJ. Reduced-rank regression for the multivariate linear model. J Multivariate Analysis 1975;5(2):248–64.
|
[22] |
Hoffmann K, Schulze MB, Schienkiewitz A, Nothlings U, Boeing H. Application of a new statistical method to derive dietary patterns in nutritional epidemiology. Am J Epidemiol 2004;159(10):935–44.
|
[23] |
Cespedes EM, Hu FB. Dietary patterns: from nutritional epidemiologic analysis to national guidelines. Am J Clin Nutr 2015;101(5):899–900.
|
[24] |
Vounou M, Nichols TE, Montana G; Alzheimer’s Disease Neuroimaging Initiative. Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. NeuroImage 2010;53(3):1147–59.
|
[25] |
Vounou M, Janousova E, Wolz R, Stein JL, Thompson PM, Rueckert D, et al. Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease. NeuroImage 2012;60(1):700–16.
|
[26] |
Chen L, Huang JZ. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J Am Stat Assoc 2012;107(500):1533–45.
|
[27] |
Chen L, Huang JZ. Sparse reduced-rank regression with covariance estimation. Stat Comput 2016;26(1–2):461–70.
|
[28] |
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Series B Stat Methodol 2006;68(1):49–67.
|
[29] |
Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006;101 (476):1418–29.
|
[30] |
Ma Z, Sun T. Adaptive sparse reduced-rank regression. 2014. arxiv:1403.1922.
|
[31] |
Huang J, Breheny P, Ma S. A selective review of group selection in highdimensional models. Stat Sci 2012;27(4):481–99.
|
[32] |
Peng J, Zhu J, Bergamaschi A, Han W, Noh DY, Pollack JR, et al. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 2010;4(1):53–77.
|
[33] |
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143(1):29–36.
|
[34] |
Falony G, Joossens M, Vieira-Silva S, Wang J, Darzi Y, Faust K, et al. Populationlevel analysis of gut microbiome variation. Science 2016;352(6285):560–4.
|
[35] |
Wan Y, Wang F, Yuan J, Li J, Jiang D, Zhang J, et al. Effects of dietary fat on gut microbiota and faecal metabolites, and their relationship with cardiometabolic risk factors: a 6-month randomised controlled-feeding trial. Gut 2019;68 (8):1417–29.
|
[36] |
Sanna S, van Zuydam NR, Mahajan A, Kurilshikov A, Vila AV, Võsa U, et al. Causal relationships among the gut microbiome, short-chain fatty acids and metabolic diseases. Nat Genet 2019;51(4):600–5.
|
[37] |
Maier L, Pruteanu M, Kuhn M, Zeller G, Telzerow A, Anderson EE, et al. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 2018;555(7698):623–8.
|
[38] |
Segata N, Boernigen D, Tickle TL, Morgan XC, Garrett WS, Huttenhower C. Computational metaomics for microbial community studies. Mol Syst Biol 2013;9(1):666.
|
/
〈 |
|
〉 |