通过机器学习开发基于作物基因型的抗病表型精准预测方法

刘琪, 左世敏, 彭沙莎, 张昊, 彭烨, 李魏, 熊叶辉, 林润茂, 冯志明, 李慧慧, 杨俊, 王国梁, 康厚祥

工程(英文) ›› 2024, Vol. 40 ›› Issue (9) : 100-110.

PDF(4700 KB)
PDF(4700 KB)
工程(英文) ›› 2024, Vol. 40 ›› Issue (9) : 100-110. DOI: 10.1016/j.eng.2024.03.014
研究论文
Article

通过机器学习开发基于作物基因型的抗病表型精准预测方法

作者信息 +

Development of Machine Learning Methods for Accurate Prediction of Plant Disease Resistance

Author information +
History +

Abstract

The traditional method of screening plants for disease resistance phenotype is both time-consuming and costly. Genomic selection offers a potential solution to improve efficiency, but accurately predicting plant disease resistance remains a challenge. In this study, we evaluated eight different machine learning (ML) methods, including random forest classification (RFC), support vector classifier (SVC), light gradient boosting machine (lightGBM), random forest classification plus kinship (RFC_K), support vector classification plus kinship (SVC_K), light gradient boosting machine plus kinship (lightGBM_K), deep neural network genomic prediction (DNNGP), and densely connected convolutional networks (DenseNet), for predicting plant disease resistance. Our results demonstrate that the three plus kinship (K) methods developed in this study achieved high prediction accuracy. Specifically, these methods achieved accuracies of up to 95% for rice blast (RB), 85% for rice black-streaked dwarf virus (RBSDV), and 85% for rice sheath blight (RSB) when trained and applied to the rice diversity panel I (RDPI). Furthermore, the plus K models performed well in predicting wheat blast (WB) and wheat stripe rust (WSR) diseases, with mean accuracies of up to 90% and 93%, respectively. To assess the generalizability of our models, we applied the trained plus K methods to predict RB disease resistance in an independent population, rice diversity panel II (RDPII). Concurrently, we evaluated the RB resistance of RDPII cultivars using spray inoculation. Comparing the predictions with the spray inoculation results, we found that the accuracy of the plus K methods reached 91%. These findings highlight the effectiveness of the plus K methods (RFC_K, SVC_K, and lightGBM_K) in accurately predicting plant disease resistance for RB, RBSDV, RSB, WB, and WSR. The methods developed in this study not only provide valuable strategies for predicting disease resistance, but also pave the way for using machine learning to streamline genome-based crop breeding.

Keywords

Predicting plant disease resistance / Genomic selection / Machine learning / Genome-wide association study

引用本文

导出引用
刘琪, 左世敏, 彭沙莎. 通过机器学习开发基于作物基因型的抗病表型精准预测方法. Engineering. 2024, 40(9): 100-110 https://doi.org/10.1016/j.eng.2024.03.014

参考文献

[1]
Lee FN. Rice sheath blight: a major rice disease. Plant Dis 1983; 67(7):829.
[2]
Skamnioti P, Gurr SJ. Against the grain: safeguarding rice from rice blast disease. Trends Biotechnol 2009; 27(3):141-50.
[3]
Zhou T, Du L, Wang L, Wang Y, Gao C, Lan Y, et al. Genetic analysis and molecular mapping of QTLs for resistance to rice black-streaked dwarf disease in rice. Sci Rep 2015; 5:10509.
[4]
Schwessinger B. Fundamental wheat stripe rust research in the 21st century. New Phytol 2017; 213(4):1625-31.
[5]
Ceresini PC, Castroagudín VL, Rodrigues F, Rios JA, Aucique-Pérez CE, Moreira SI, et al. Wheat blast: past, present, and future. Annu Rev Phytopathol 2018; 56:427-56.
[6]
Chen Z, Feng Z, Kang H, Zhao J, Chen T, Li Q, et al. Identification of new resistance loci against sheath blight disease in rice through genome-wide association study. Rice Sci 2019; 26(1):21-31.
[7]
Li W, Chern M, Yin J, Wang J, Chen X. Recent advances in broad-spectrum resistance to the rice blast disease. Curr Opin Plant Biol 2019; 50:114-20.
[8]
Sun S, Zhou Y, Chen J, Shi J, Zhao H, Zhao H, et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet 2018; 50(9):1289-95.
[9]
Thomas WJW, Zhang Y, Amas JC, Cantila AY, Zandberg JD, Harvie SL, et al. Innovative advances in plant genotyping. In: Shavrukov Y, editor. Plant genotyping. Berlin: Springer; 2023. p. 451-65.
[10]
Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 2018; 557 (7703):43-9.
[11]
Burghardt LT, Young ND, Tiffin P. A guide to genome-wide association mapping in plants. Curr Protoc Plant Biol 2017; 2(1):22-38.
[12]
Cortes LT, Zhang Z, Yu J. Status and prospects of genome-wide association studies in plants. Plant Genome 2021; 14(1):e20077.
[13]
Lu J, Wang C, Zeng D, Li J, Shi X, Shi Y, et al. Genome-wide association study dissects resistance loci against bacterial blight in a diverse rice panel from the 3000 rice genomes project. Rice (N Y) 2021; 14(1):22.
[14]
Sattayachiti W, Wanchana S, Arikit S, Nubankoh P, Patarapuwadol S, Vanavichit A, et al. Genome-wide association analysis identifies resistance loci for bacterial leaf streak resistance in rice (Oryza sativa L.). Plants 2020; 9 (12):1673.
[15]
Zhang F, Zeng D, Zhang CS, Lu JL, Chen TJ, Xie JP, et al. Genome-wide association analysis of the genetic basis for sheath blight resistance in rice. Rice 2019; 12(1):93.
[16]
Long W, Yuan Z, Fan F, Dan D, Pan G, Sun H, et al. Genome-wide association analysis of resistance to rice false smut. Mol Breed 2020; 40(5):46.
[17]
Kang H, Wang Y, Peng S, Zhang Y, Xiao Y, Wang D, et al. Dissection of the genetic architecture of rice resistance to the blast fungus Magnaporthe oryzae. Mol Plant Pathol 2016; 17(6):959-72.
[18]
Liu MH, Kang H, Xu Y, Peng Y, Wang D, Gao L, et al. Genome-wide association study identifies an NLR gene that confers partial resistance to Magnaporthe oryzae in rice. Plant Biotechnol J 2020; 18(6):1376-83.
[19]
Zhu D, Kang H, Li Z, Liu M, Zhu X, Wang Y, et al. A genome-wide association study of field resistance to Magnaporthe oryzae in rice. Rice 2016; 9:44.
[20]
Xu Y, Bai L, Liu M, Liu Y, Peng S, Hu P, et al. Identification of two novel rice S genes through combination of association and transcription analyses with gene-editing technology. Plant Biotechnol J 2023; 21(8):1628-41.
[21]
Su P, Kang H, Peng Q, Wicaksono WA, Berg G, Liu Z, et al. Microbiome homeostasis on rice leaves is regulated by a precursor molecule of lignin biosynthesis. Nat Commun 2024; 15(1):23.
[22]
Xu Y, Ma K, Zhao Y, Wang X, Zhou K, Yu G, et al. Genomic selection: a breakthrough technology in rice breeding. Crop J 2021; 9(3):669-77.
[23]
Crossa J, Fritsche-Neto R, Montesinos-Lopez OA, Costa-Neto G, Dreisigacker S, Montesinos-Lopez A, et al. The modern plant breeding triangle: optimizing the use of genomics, phenomics, and environics data. Front Plant Sci 2021; 12:651480.
[24]
Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 2017;22(11):961-75.
[25]
Jeong S, Kim JY, Kim N. GMStool: GWAS-based marker selection tool for genomic prediction from genomic data. Sci Rep 2020; 10(1):19653.
[26]
Zhang Y, Zhang M, Ye J, Xu Q, Feng Y, Xu S, et al. Integrating genome-wide association study into genomic selection for the prediction of agronomic traits in rice (Oryza sativa L.). Mol Breed 2023; 43(11):81.
[27]
Wang W, Guo W, Le L, Yu J, Wu Y, Li D, et al. Integrating high-throughput phenotyping, GWAS and prediction models reveals the genetic architecture of plant height in maize. Mol Plant 2022; 16(2):354-73.
[28]
Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022; 23(1):40-55.
[29]
Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, et al. A review of deep learning applications for genomic selection. BMC Genomics 2021; 22(1):19.
[30]
Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering. Nat Methods 2019; 16(8):687-94.
[31]
Jones OT, Matin RN, van der Schaar M, Bhayankaram KP, Ranmuthu CKI, Islam MS, et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit Health 2022; 4(6):e466-76.
[32]
Najafabadi MY, Hesami M, Eskandari M. Machine learning-assisted approaches in modernized plant breeding programs. Genes 2023; 14(4):777.
[33]
Wang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. Front Plant Sci 2023; 14:1260089.
[34]
Pérez-Rodríguez P, Gianola D, González-Camacho JM, Crossa J, Manès Y, Dreisigacker S. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3Genes Genom Genet 2012; 2(12):1595-605.
[35]
Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells M. Evaluation of genomic prediction methods for fusarium head blight resistance in wheat. Plant Genome 2012; 5(2):51-61.
[36]
Xu Y, Wang X, Ding X, Zheng X, Yang Z, Xu C, et al. Genomic selection of agronomic traits in hybrid rice using an NCII population. Rice 2018; 11:32.
[37]
Yoosefzadeh-Najafabadi M, Rajcan I, Eskandari M. Optimizing genomic selection in soybean: an important improvement in agricultural genomics. Heliyon 2022; 8(11):e11873.
[38]
Sandhu KS, Lozada DN, Zhang Z, Pumphrey MO, Carter AH. Deep learning for predicting complex traits in spring wheat breeding program. Front Plant Sci 2020; 11:613325.
[39]
Ornella L, González-Camacho JM, Dreisigacker S, Crossa J. Applications of genomic selection in breeding wheat for rust resistance. In: Periyannan S, editor. Wheat rust diseases. Berlin: Springer; 2017. p. 173-82.
[40]
Arruda MP, Brown PJ, Lipka AE, Krill AM, Thurber C, Kolb FL. Genomic selection for predicting fusarium head blight resistance in a wheat breeding program. Plant Genome 2015; 8(3): plantgenome2015.01.0003.
[41]
Technow F, Bürger A, Melchinger AE. Genomic prediction of northern corn leaf blight resistance in maize with combined or separated training sets for heterotic groups. G3Genes Genom Genet 2013;3(2):197-203.
[42]
Montesinos-López OA, Montesinos-López JC, Singh P, Lozano-Ramirez N, Barrón-López A, Montesinos-López A, et al. A multivariate poisson deep learning model for genomic prediction of count data. G3Genes Genom Genet 2020; 10(11):4177-90.
[43]
Pérez-Rodríguez P, Flores-Galarza S, Vaquera-Huerta H, del Valle-Paniagua DH, Montesinos-López OA, Crossa J. Genome-based prediction of Bayesian linear and non-linear regression models for ordinal data. Plant Genome 2020; 13(2):e20021.
[44]
Huang M, Balimponya EG, Mgonja EM, McHale LK, Luzi-Kihupi A, Wang GL, et al. Use of genomic selection in breeding rice (Oryza sativa L.) for resistance to rice blast (Magnaporthe oryzae). Mol Breed 2019; 39(8):114.
[45]
Eizenga GC, Ali ML, Bryant RJ, Yeater KM, McClung AM, McCouch SR. Registration of the rice diversity panel 1 for genomewide association studies. J Plant Regist 2014; 8(1):109-16.
[46]
Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, et al. Genomewide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2011; 2:467.
[47]
Feng Z, Kang H, Li M, Zou L, Wang X, Zhao J, et al. Identification of new rice cultivars and resistance loci against rice black-streaked dwarf virus disease through genome-wide association study. Rice 2019; 12:49.
[48]
Juliana P, Poland J, Huerta-Espino J, Shrestha S, Crossa J, Crespo-Herrera L, et al. Improving grain yield, stress resilience and quality of bread wheat using largescale genomics. Nat Genet 2019; 51(10):1530-9.
[49]
McCouch SR, Wright MH, Tung CW, Maron LG, McNally KL, Fitzgerald M, et al. Open access resources for genome-wide association mapping in rice. Nat Commun 2016; 7:10532.
[50]
Zhu X, Chen S, Yang J, Zhou S, Zeng L, et al. The identification of Pi50(t), a new member of the rice blast resistance Pi2/Pi9 multigene family. Theor Appl Genet 2012; 124:1295-304.
[51]
Mgonja EM, Balimponya EG, Kang H, Bellizzi M, Park CH, Li Y, et al. Genomewide association mapping of rice resistance genes against Magnaporthe oryzae isolates from four african countries. Phytopathology 2016; 106(11):1359-65.
[52]
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 2007; 23(19):2633-5.
[53]
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 2016; 33(7):1870-4.
[54]
Xie J, Chen Y, Cai G, Cai R, Hu Z, Wang H. Tree visualization by one table (tvBOT): a web application for visualizing, modifying and annotating phylogenetic trees. Nucleic Acids Res 2023; 51(W1):W587-92.
[55]
Breiman L. Random forests. Mach Learn 2001; 45(1):5-32.
[56]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12:2825-30.
[57]
Awad M, Khanna R. Support vector machines for classification. In: Awad M, Khanna R, editors. Efficient learning machines: theories, concepts, and applications for engineers and system designers. Berlin: Springer; 2015. p. 39-66.
[58]
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 2020; 408:189-215.
[59]
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Red Hook, NY, USA. ACM Digital Library; 2017.
[60]
Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol Plant 2023; 16(1):279-93.
[61]
Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K. DenseNet: implementing efficient ConvNet descriptor pyramids. 2014. arXiv.1404.1869.
[62]
Xu W, Zhao L, Li J, Shang S, Ding X, Wang T. Detection and classification of tea buds based on deep learning. Comput Electron Agric 2022; 192:106547.
[63]
Tong H, Nikoloski Z. Machine learning approaches for crop improvement: leveraging phenotypic and genotypic big data. J Plant Physiol 2021; 257:153354.
[64]
Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006; 27 (8):861-74.
[65]
González-Camacho JM, Crossa J, Pérez-Rodríguez P, Ornella L, Gianola D. Genome-enabled prediction using probabilistic neural network classifiers. BMC Genomics 2016;17:208.
[66]
Ban Z, Yuan P, Yu F, Peng T, Zhou Q, Hu X. Machine learning predicts the functional composition of the protein corona and the cellular recognition of nanoparticles. Proc Natl Acad Sci USA 2020; 117(19):10492-9.
[67]
Liu Y, Wang D, He F, Wang J, Joshi T, Xu D. Phenotype prediction and genomewide association study using deep convolutional neural network of soybean. Front Genet 2019; 10:1091.
[68]
Qiu Z, Cheng Q, Song J, Tang Y, Ma C. Application of machine learning-based classification to genomic selection and performance improvement. In: Huang DS, Bevilacqua V, Premaratne P, editors. Intelligent computing theories and application. Berlin: Springer; 2015. p. 412-21.
[69]
Larkin DL, Lozada DN, Mason RE. Genomic selection—considerations for successful implementation in wheat breeding programs. Agronomy 2019; 9 (9):479.
[70]
Ornella L, Pérez P, Tapia E, González-Camacho JM, Burgueño J, Zhang X, et al. Genomic-enabled prediction with classification algorithms. Heredity 2014; 112(6):616-26.
[71]
González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J. Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. Plant Genome 2018; 11(2):170104.
[72]
Cericola F, Jahoor A, Orabi J, Andersen JR, Janss LL, Jensen J. Optimizing training population size and genotyping strategy for genomic prediction using association study results and pedigree information. A case of study in advanced wheat breeding lines. PLoS One 2017; 12(1):e0169606.
[73]
Abed A, Pérez-Rodríguez P, Crossa J, Belzile F. When less can be better: how can we make genomic selection more cost-effective and accurate in barley? Theor Appl Genet 2018; 131(9):1873-90.
[74]
Zhong S, Dekkers JCM, Fernando RL, Jannink JL. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 2009; 182(1):355-64.
[75]
Sarinelli JM, Murphy JP, Tyagi P, Holland JB, Johnson JW, Mergoum M, et al. Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel. Theor Appl Genet 2019; 132(4):1247-61.
[76]
Budhlakoti N, Kushwaha AK, Rai A, Chaturvedi KK, Kumar A, Pradhan AK, et al. Genomic selection: a tool for accelerating the efficiency of molecular breeding for development of climate-resilient crops. Front Genet 2022; 13:832153.
[77]
Yang W, Feng H, Zhang X, Zhang J, Doonan JH, Batchelor WD, et al. Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol Plant 2020; 13(2):187-214.
PDF(4700 KB)

Accesses

Citation

Detail

段落导航
相关文章

/