Development of Machine Learning Methods for Accurate Prediction of Plant Disease Resistance

Qi Liu , Shi-min Zuo , Shasha Peng , Hao Zhang , Ye Peng , Wei Li , Yehui Xiong , Runmao Lin , Zhiming Feng , Huihui Li , Jun Yang , Guo-Liang Wang , Houxiang Kang

Engineering ›› 2024, Vol. 40 ›› Issue (9) : 108 -119.

PDF (4700KB)
Engineering ›› 2024, Vol. 40 ›› Issue (9) :108 -119. DOI: 10.1016/j.eng.2024.03.014
Research
Article
Development of Machine Learning Methods for Accurate Prediction of Plant Disease Resistance
Author information +
History +
PDF (4700KB)

Abstract

The traditional method of screening plants for disease resistance phenotype is both time-consuming and costly. Genomic selection offers a potential solution to improve efficiency, but accurately predicting plant disease resistance remains a challenge. In this study, we evaluated eight different machine learning (ML) methods, including random forest classification (RFC), support vector classifier (SVC), light gradient boosting machine (lightGBM), random forest classification plus kinship (RFC_K), support vector classification plus kinship (SVC_K), light gradient boosting machine plus kinship (lightGBM_K), deep neural network genomic prediction (DNNGP), and densely connected convolutional networks (DenseNet), for predicting plant disease resistance. Our results demonstrate that the three plus kinship (K) methods developed in this study achieved high prediction accuracy. Specifically, these methods achieved accuracies of up to 95% for rice blast (RB), 85% for rice black-streaked dwarf virus (RBSDV), and 85% for rice sheath blight (RSB) when trained and applied to the rice diversity panel I (RDPI). Furthermore, the plus K models performed well in predicting wheat blast (WB) and wheat stripe rust (WSR) diseases, with mean accuracies of up to 90% and 93%, respectively. To assess the generalizability of our models, we applied the trained plus K methods to predict RB disease resistance in an independent population, rice diversity panel II (RDPII). Concurrently, we evaluated the RB resistance of RDPII cultivars using spray inoculation. Comparing the predictions with the spray inoculation results, we found that the accuracy of the plus K methods reached 91%. These findings highlight the effectiveness of the plus K methods (RFC_K, SVC_K, and lightGBM_K) in accurately predicting plant disease resistance for RB, RBSDV, RSB, WB, and WSR. The methods developed in this study not only provide valuable strategies for predicting disease resistance, but also pave the way for using machine learning to streamline genome-based crop breeding.

Graphical abstract

Keywords

Predicting plant disease resistance / Genomic selection / Machine learning / Genome-wide association study

Cite this article

Download citation ▾
Qi Liu, Shi-min Zuo, Shasha Peng, Hao Zhang, Ye Peng, Wei Li, Yehui Xiong, Runmao Lin, Zhiming Feng, Huihui Li, Jun Yang, Guo-Liang Wang, Houxiang Kang. Development of Machine Learning Methods for Accurate Prediction of Plant Disease Resistance. Engineering, 2024, 40(9): 108-119 DOI:10.1016/j.eng.2024.03.014

登录浏览全文

4963

注册一个新账户 忘记密码

1. Introduction

Severe diseases caused by a variety of pathogens, such as fungi, bacteria, viruses, and other microorganisms, are among the primary factors contributing to the reduction in crops. For instance, rice blast (RB) and rice sheath blight (RSB) diseases caused by fungal pathogens Magnaporthe oryzae (M. oryae) and Rhizoctonia solani (R. solani), respectively, significantly reduce global rice (Oryza sativa L.) yields [1], [2]. Rice black-streaked dwarf virus (RBSDV), spread by small brown planthoppers (Laodelphax stria-tellus Fallén), severely affects yield in China and other East Asian countries [3]. Wheat blast (WB) disease and wheat stripe rust (WSR) disease caused by fungi Pyricularia graminis-tritici (Pygt) and Puccinia striformis f. sp. tritici (Pst), respectively, pose significant challenges to global wheat production [4], [5].

The most effective and environmentally friendly method to manage diseases in rice and wheat is the use of resistant cultivars containing resistance genes (R genes) [3], [5], [6], [7]. However, R gene-mediated resistance often breaks down after years of large-scale cultivation, and evaluating disease resistance of diverse rice or wheat varieties based solely on the presence of known R genes often yields suboptimal results.

Since plant genotyping has become more affordable, thousands of genotypes of important crop plants, including rice, wheat, and maize, have been made publicly available [8], [9], [10]. Genome-wide association studies (GWAS), which can efficiently map traits based on genotypes and phenotypes, have emerged as a powerful tool for investigating complex traits in plants [11], [12]. GWAS has identified hundreds of marker trait associations (MTA) in both rice and wheat. For instance, in rice, 12 bacterial blight MTA [13], 11 bacterial leaf streak MTA [14], 27 sheath blight MTA [15], 15 rice false smut MTA [16], and more than 200 RB MTA have been identified through GWAS [17], [18], [19]. Recent studies indicated that these loci contain not only dominant R genes but also susceptibility genes [20]. In addition, the microbiome-shaping gene (M gene) plays a potential crucial role in conferring broad-spectrum disease resistance to plants [21]. These results suggest that targeting these resistance loci may be more effective than solely relying on known R genes. However, selecting and utilizing these MTAs in crop breeding is challenging.

Over the past two decades, a variety of genomic selection (GS) models have been implemented in plant breeding [22]. GS models can predict the genomic estimated breeding values (GEBVs) of genotyped individuals, reducing breeding time and improving selection accuracy [23], [24]. Researchers have developed GWAS-based tools for GS using genome-wide marker data. For example, the GMStool uses appropriate statistical and machine learning (ML)-based models to search for the optimal number of markers and select the best predictive model [25]. Taking rice and maize as examples, breeders successfully integrated associated loci from GWAS results into GS models, greatly improving the prediction accuracy of their important agronomic traits [26], [27].

Given the increasing quantity and complexity of biological data, there is an urgent need to incorporate ML algorithms to effectively manage the exponential growth of genomic data and gain insights into biological processes [28], [29]. ML methods have made significant advancements in various biological fields, including protein engineering [30] and early cancer detection [31]. Furthermore, ML methods are also being utilized in the agricultural industry, facilitating precise identification of crucial plant traits customized to specific requirements. This offers more cost-effective and efficient strategies for cultivating novel crop varieties, benefiting plant breeders [32], [33].

Various nonparametric ML methods have been applied to GS in plants. These methods include support vector machines (SVMs), reproducing kernel Hilbert spaces (RKHSs), neural networks (NNs), and random forests (RFs) [34], [35], [36]. The integration of ML into GS improves prediction accuracy on important traits, like soybean yield [37], wheat grain yield, and protein content [38]. ML has been also reported for GS to enhance plant disease resistance, such as wheat rust [39], wheat fusarium head blight [40], and maize leaf blight [41]. Specifically, a study investigating the utility of ML in GS for disease resistance in wheat found that the prediction accuracy for fusarium head blight resistance reached a maximum of 57.50% using the RKHS method [35]. The deep neural network, called multi-layer perceptron (MLP), was utilized to predict resistance to wheat fusarium head blight [42], maize gray leaf spot, and maize Septoria [43]. In rice, the genomic Bayesian and best linear unbiased prediction (GBLUP) was utilized to predict RB disease resistance, with predictive abilities ranging from 15% to 72% across strains [44].

In this study, we integrated GWAS results, disease resistance phenotypes, and population kinship (K) information to develop three novel ML models, namely random forest classification plus kinship (RFC_K), support vector classification plus kinship (SVC_K), and light gradient boosting machine plus kinship (lightGBM_K). We compared these three models with other five ML methods and found that the RFC_K, SVC_K, and lightGBM_K models achieved high prediction accuracies for various diseases: up to 95% accuracy for RB resistance, and 85% accuracy for both RBSDV and RSB resistance. Additionally, the models exhibited accuracies of up to 90% and 93% for WB and WSR resistance. To validate the generalizability of our models, we applied them to an independent rice population, rice diversity panel II (RDPII), and scored a high prediction accuracy of 90% for RB resistance when compared with actual spray inoculation results. Our method not only offers an effective approach to predict RB, RBSDV, RSB, WB, and WSR disease resistance but also paves the way for using ML to streamline genome-based crop breeding.

2. Materials and Methods

2.1. Plant and fungal materials

Rice diversity panel I (RDPI) is a publicly available germplasm collection comprising 413 Oryza sativa L. accessions gathered from ten geographic regions [45], [46]. Phenotype data of RDPI are available from our previous reports, including RB spray inoculation phenotypes in the greenhouse with two M. oryzae strains (RO1-1 and RB22) [17], RBSDV natural infection phenotypes in the field at two test locations in Kaifeng City in Henan Province and Yutai County in Shandong Province [47], RSB spray inoculation phenotypes in the greenhouse with three R. solani strains (1-YN7, 2-MH12, and A-GN43) [6]. The global bread wheat breeding program of the International Maize and Wheat Improvement Center (CIMMYT) includes six panels of wheat population lines. WB and WSR phenotype data are available from previously published literature [48]. The RDPII collection contains 1445 accessions from 92 countries worldwide [49]. In this study, a core collection of 581 accessions from RDPII was obtained from the Guangdong Academy of Agricultural Science. M. oryzae strains RO1-1 and RB22 used for inoculation were stored in the laboratory.

2.2. Evaluation of RB disease resistance by spray inoculation

Three-week-old seedlings of the 581 rice accessions of RDPII were spray inoculated with RO1-1 and RB22 spores with a concentration of 5 × 105 conidia∙mL−1 in 0.1% Tween. Six days after the inoculation, the severity level of the disease was assessed using the 0-9 blast scoring system—0: no disease symptoms; 1: small closed lesions, each less than 1 cm in length; 2: small lesions with grey centers; 3: small elliptical lesions with heavy borders, some exceeding 1 cm in length; 4: expanding elliptical lesions; 5: some forming patches, with the lesion area covering 10%-25%; 6: lesion area covering 26%-50%; 7: lesion area covering 51%-75%; 8: lesion area covering 76%-90%; 9: lesion area > 90% or complete leaf necrosis. Scores 0-3 indicate high resistance and scores 6-9 indicate high susceptibility. Three biological repeats were used for each rice accession [19], [50]. In this study, we focused on high resistance and high susceptibility accessions for the construction and evaluation of the model.

2.3. Selection of rice and wheat disease resistance-associated single-nucleotide polymorphisms (SNPs) for ML

The method previously described was employed to conduct the GWAS [17], [51]. Briefly, we employed the Tassel 5.0 software and utilized the mixed linear model (MLM), which integrates kinship and population structure matrices [52], as our method. SNPs with a minor allele frequency (MAF) of ≤ 0.05 were filtered out during the GWAS analysis. We selected the SNPs that could be utilized for constructing the ML model by screening various p-value thresholds obtained from the GWAS analysis.

2.4. Selection of rice and wheat accessions for ML

Two methods were employed to select rice and wheat accessions for training the ML model. The first method involved random selection, where rice or wheat accessions were randomly chosen as the training datasets, and the resmaining accessions were used as the test datasets. The second method focused on selecting rice or wheat accessions from the kinship tree. To do this, a pair-wise distance tree was constructed, and the representative accessions (avoid the selection of multiple accessions with very close relationships) were chosen using Perl script. With a 3:1 ratio for the training and test sets, rice or wheat population varieties sharing the same genetic distance were clustered according to kinship to determine the composition of the training set. All phylogenetic trees were constructed using molecular evolutionary genetics analysis (MEGA) v7.0 [53]. The phylogenetic tree visualization was enhanced with tree visualization by one table (tvBOT) for improved clarity and presentation [54].

2.5. Construction of ML model used to compare

2.5.1. Random forest classification (RFC) and RFC_K models

The RFC algorithm, a robust ML algorithm based on decision trees, fits multiple decision trees to different subsets of the dataset. It obtains predictions from each tree and enhances prediction accuracy through a voting mechanism [55]. In this study, we chose to import the RandomForestClassifier from the sklearn package of Python with default parameters for establishing the RFC prediction classification model. To incorporate kinship into the training set, we established the RFC_K model [56].

2.5.2. Support vector classifier (SVC) and SVC_K model

Support vector machines (SVMs) have become a powerful method for data classification and regression. SVC, a variant of SVM, determines decision functions directly from training data by maximizing the margin between decision boundaries in a high-dimensional feature space. This classification strategy minimizes classification errors and improves the generalization ability, especially when dealing with limited input data [57], [58]. In this study, we imported the SVC module from the sklearn package in Python, employing default parameters. To incorporate kinship and control the training set, we established the SVC_K model.

2.5.3. Light gradient boosting machine (lightGBM) and lightGBM_K model

LightGBM, an algorithm based on gradient boosting decision trees (GBDT), optimizes training speed and memory usage through the utilization of histogram-based algorithms [59]. By employing a leaf-wise growth strategy and depth constraints, lightGBM introduces a maximum depth limit at the top of the leaf to prevent overfitting, ensuring high efficiency. In this study, the lightGBM is created using the LGBMClassifier from the lightgbm package in Python with default parameters. Additionally, incorporating kinship is employed to manage the training set for establishing the lightGBM_K model.

2.5.4. Deep neural network genomic prediction (DNNGP) model

DNNGP is based on convolutional neural network (CNN) architecture, consisting of one input layer, three convolutional layers, one batch normalization layer, two dropout layers, one flattening layer, one dense layer, and one output layer. The model receives genomic markers (or other omics data) as input, with CNN layers serving as key components for computation. DNNGP applies rectified linear unit (ReLU) activation functions, L2 regularization, batch normalization, and dropout layers to enhance model robustness and prevent overfitting. The architecture includes a callback function for early stopping, contributing to model optimization during training [60].

2.5.5. Densely connected convolutional networks (DenseNet) model

DenseNet201 is a CNN architecture in which each layer is intricately connected to all others within a dense block [61]. This design effectively addresses the gradient vanishing problem, facilitates feature propagation, and minimizes the model size by reusing numerous features with a limited number of convolution kernels. In a DenseNet201 network consisting of L layers, the output of each layer serves as input for all subsequent layers, resulting in a total of L(L + 1)/2 connections [62]. This unique connectivity not only alleviates gradient dissipation during training but also contributes to a more compact model with improved generalization. For individual genotypes, we employed the one-hot encoding method to convert the GWAS-identified significant SNPs into codes (A = ‘0001'; T = ‘0010'; G = ‘0100'; C = ‘1000'; R/Y/M/K/S/W/N = ‘0000'). Secondly, we utilized a Perl SVG module script to transform the codes into a visual representation. The phenotype (susceptibility (S) or resistance (R)) of an individual was then employed to classify these visual representations. For the CNN training framework, we utilized a DenseNet121 network (with appropriate modifications), which is a neural network consisting of 121 layers. All of the analyses were conducted in a PyTorch framework in the Linux system.

2.6. Model validation

Model validation was performed using a ten-fold cross-validation (CV) approach. This process involved dividing the training population into ten folds, with each fold containing an equal number of varieties. The stratified fold function from the sklearn package in Python was utilized for this purpose (CV = 10) [56]. The entire dataset was divided into ten subsets of equal size (folds). For each CV experiment, one fold was used for CV testing, while the remaining nine folds were used to make up the training population, ensuring that every sample was exposed to both training and testing phases [63]. Default parameters were employed for the model to maintain consistency and repeatability, avoiding any potential subjectivity introduced during parameter tuning.

In classifications (this study used RFC, RFC_K, SVC, SVC_K, lightGBM, and lightGBM_K), commonly used metrics for evaluating model performance include accuracy, precision, and recall. Accuracy represents the proportion of correctly predicted samples out of the total samples. Precision indicates the accuracy of the positive predictions made by the classifier. Recall emphasizes the proportion of actual positive instances captured. The F1-score is a weighted average of precision and recall, providing a balanced assessment of both the accuracy and recall of a model. For the evaluation of deep learning (DL) methods (DNNGP and DenseNet), accuracy is the matric used to assess their performance. We evaluated the models based on their mean accuracy on test datasets to determine their generalizability.

$\text { Accuracy }=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$
$\text { Precision }=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$
$\text { Recall }=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$
$\text { F1-score }=\frac{2 \times \text { Precision } \times \text { Recall }}{\text { Precision }+ \text { Recall }}$

where TP stands for true positive; TN is true negative; FP is false positive; FN is false negative. Additionally, the receiver operating characteristics (ROC) graph was employed as a visual representation of classifier performance, representing the true positive rate versus the false positive rate [64]. The area under the curve (AUC) functions as a quantitative measure for assessing classifier performance, ranging from 0 to 1, where a value closer to 1 signifies superior performance. The AUC value is widely recognized as a coefficient for assessing classifier predictability [65]. All ROC graphs were generated by importing roc_curve and roc_auc_score functions from the sklearn package in Python, with default parameters [56].

3. Results

3.1. ML methods for predicting plant disease resistance

We developed an analysis pipeline that includes dataset establishment (Figs. 1(a) and (b)), construction of the plus kindship ML models of RFC_K, SVC_K, and lightGBM_K (Fig. 1(c)), validation of these models (Fig. 1(d)), application of these models to other independent rice populations, and spray inoculation for phenotype evaluation (Fig. 1(e)). First, taking RB disease resistance prediction as an example, we established the data set containing the RB disease resistance phenotypes of the RDPI accessions selected by pair-wise genetic distance matrix; the phenotype-associated SNP markers were chosen based on the results obtained from GWAS [17] (Figs. 1(a) and (b)). Next, we used the datasets to construct three plus kinship ML models by adding the K (a pair-wise distance matrix for population kinship) of the population (Fig. 1(c)). Then we used a ten-fold CV, ROC curve, and the prediction accuracy of RB to evaluate the performance of these models (Fig. 1(d)). We used the final plus kinship model to predict the RB disease resistance of another independent rice population, RDPII. In addition, we performed a spray inoculation of RDPII accessions and assessed their resistance levels to validate the prediction (Fig. 1(e)).

3.2. SNP selection and establishment of the best p-value thresholds

To investigate the genetic basis of disease resistance in rice and wheat, we performed a GWAS using filtered SNPs (undetermined SNPs < 5% and MAF ≥ 0.05) and the RB, RBSDV, and RSB resistance phenotypes of the RDPI rice population [6], [17], [47] and the WB, WSR resistance phenotypes of the wheat population [48]. The SNPs associated with rice and wheat disease resistance are presented in Tables S1 and S2 in Appendix A, respectively. To avoid ambiguous results, only the highly resistant and the highly susceptible rice and wheat accessions were used to establish the ML models. The criteria of classification and the number of varieties being used to train the ML classification models are summarized in Table S3 in Appendix A.

Firstly, we constructed the RFC, SVC, and lightGBM models for RB resistance prediction to determine the suitable p-value thresholds. We integrated phenotype data for RB resistance and the GWAS-identified SNPs into the RFC algorithm to train the models. The SNPs were determined based on the different p-value thresholds from the GWAS, including p ≤ 1.0 × 10-4, p ≤ 1.0 × 10-3, p ≤ 1.0 × 10-2, p ≤ 2.0 × 10-2, p ≤ 3.0 × 10-2, p ≤ 4.0 × 10-2, and p ≤ 5.0 × 10-2. A total of 142 SNPs were found to be associated with resistance to both strains RO1-1 and RB22 under p ≤ 1.0 × 10-2 (Table S4 in Appendix A).

To assess the accuracy of predicting RB resistance, we employed ten-fold CV for the three classification models. This approach helps alleviate overfitting issues that may arise when training models on limited training data [63], [66]. We obtained ten performance metrics to comprehensively evaluate the average performance of the model. These metrics include accuracy, precision, recall, F1-score, and the AUC within the ROC graph (Tables S5 and S6 in Appendix A). Our analysis revealed that the RFC, SVC, and lightGBM models exhibited the highest prediction accuracies when constructed using thresholds of p ≤ 1.0 × 10-3 and p ≤ 1.0 × 10-2. Specifically, with RB strain RO1-1, the average prediction accuracies for RFC were 90.90 % and 92.02% within their respective threshold intervals. For SVC, the corresponding accuracies were 93.53% and 93.18%, while for lightGBM, they were 91.61% and 91.25% (Fig. S1(a) in Appendix A). With RB strain RB22, the mean accuracies were 89.79% and 90.20% for RFC within the corresponding thresholds, 91.98% and 93.10% for SVC, 89.78% and 91.65% for lightGBM (Fig. S1(b) in Appendix A). The mean accuracies and mean AUC values with RO1-1 showed a decreasing trend under the following p-value threshold intervals: p ≤ 1.0 × 10-2, p ≤ 2.0 × 10-2, p ≤ 3.0 × 10-2, p ≤ 4.0 × 10-2, and p ≤ 5.0 × 10-2; while with RB22, they slightly increased or remained unchanged (Tables S5 and S6; Fig. S1). Consequently, we selected p ≤ 1.0 × 10-3 and p ≤ 1.0 × 10-2 for further analysis.

3.3. Adding kinship to the classifiers significantly improves prediction accuracy

To enhance the prediction accuracy of classifiers, we incorporated population kinship (pair-wise distance matrix) to select specific varieties for the training set accessions rather than randomly selected varieties, and named them RFC_K, SVC_K, and lightGBM_K (refer to Tables S7 and S8 in Appendix A for more information). Using the optimal p-value thresholds of p ≤ 1.0 × 10-3 and p ≤ 1.0 × 10-2, we compared the prediction accuracies of RFC, RFC_K, SVC, SVC_K, lightGBM, lightGBM_K, DNNGP, and DenseNet models on three rice-related diseases and two wheat diseases. All models were utilized with their default parameters (Fig. 2, Fig. 3).

Specifically, when using the p-value threshold of p ≤ 1.0 × 10-3, the plus kinship models (RFC_K, SVC_K, and lightGBM_K) consistently outperformed the other five models in terms of mean prediction accuracies and achieved values exceeding 95% with both RB22 and RO1-1 strains of RB disease. The mean AUC value for these models reached 0.99, demonstrating superior performance compared to the approximately 90% achieved by RFC, 92% by SVC, 90% by lightGBM, 88% by DNNGP, and 90% by DenseNet with RB22, as well as the approximately 91% (RFC), 94% (SVC), 92% (lightGBM), 89% (DNNGP), and 92% (DenseNet) with RO1-1 (see Figs. 2(a) and (b), Tables S5 and S6). At the p-value threshold of p ≤ 1.0 × 10-2, mean accuracies significantly increased from RFC, SVC, and lightGBM to RFC_K, SVC_K, and lightGBM_K, respectively. Among these models, SVC_K exhibited the highest accuracy of 96% with RB22 and 98% with RO1-1 (Figs. 2(a) and (b); Tables S5 and S6).

The differences in prediction accuracy with RBSDV across different locations in the RDPI can be attributed to environmental variations. When examining the mean prediction accuracy with RBSDV disease in the Henan Province, RFC and SVC emerged as the most effective methods with an accuracy of 86% at the p-value thresholds of p ≤ 1.0 × 10-3 and p ≤ 1.0 × 10-2 respectively (see Fig. 2(c) and Fig. 3(c)). In Yutai, RFC_K outperformed other methods and achieved the highest accuracy of 79% at both threshold intervals; SVC_K was the second-highest performing method, with an accuracy of 78% (p ≤ 1.0 × 10-3) and 76% (p ≤ 1.0 × 10-2) (see Figs. 2(d) and 3(d)).

With the 1-YN7, 2-MH12, and A-GN43 strains of RSB disease, the SVC_K method exhibited the highest performance, with mean accuracies of approximately 88% (p ≤ 1.0 × 10-3) and 91% (p ≤ 1.0 × 10-2) for the 1-YN7 strain (Figs. 2(e) and 3(e)). With the 2-MH12 and A-GN43 strains at p ≤ 1.0 × 10-3, the RFC_K method showed the highest accuracy (85%), which was comparable to SVC_K (83%) and lightGBM_K (85% for 2-MH12, 83% for A-GN43) (Figs. 2(f) and (g)). At p ≤ 1.0 × 10-2, lightGBM_K was the best method (83% for 2-MH12, 88% for A-GN43); these two plus kinship models showed significant increases both with 2-MH12 and A-GN43 (Figs. 3(f) and (g)).

In the prediction of wheat disease resistance, RFC, RFC_K, SVC, SVC_K, lightGBM, and lightGBM_K achieved identical mean prediction accuracies of 90% for WB. Additionally, DNNGP and DenseNet also had 90% accuracy at p ≤ 1.0 × 10-3, showing an increase of 3% and a decrease of 1% compared to p ≤ 1.0 × 10-2 (Figs. 2(h) and 3(h)). For WSB, RFC_K, SVC_K, and lightGBM_K demonstrated superior performance with a mean accuracy of 93% at both threshold intervals (Figs. 2(i) and 3(i)).

Overall, the inclusion of kinship in the classifiers significantly improves the prediction accuracies of the RFC, SVC, and lightGBM models, resulting in better performance in plant disease resistance prediction.

3.4. The performance and interpretability of the RFC_K model for RB disease prediction

The inclusion of kinship (RFC_K, SVC_K, and lightGBM_K) in the models significantly improved their performance in predicting rice resistance against RB disease strains RO1-1 and RB22, with a mean AUC value of 0.99 under the p-value threshold p ≤ 1.0 × 10-2 (Figs. 3(a) and (b), Tables S5 and S6). Subsequent research focused on these three models incorporating K. The RFC_K model achieved the highest mean AUC values at 0.9975 for RO1-1 and 0.9966 for RB22. This indicates that the RFC_K model demonstrates both high predictive accuracy and high reliability. To further validate the predictions, we compared the inoculation phenotype and prediction phenotype, and the results of RFC_K are presented in the phylogenetic trees for RO1-1 (Fig. 4(a) and Table S5) and RB22 (Fig. 4(b) and Table S5).

To investigate the relationship between the SNPs and plant disease resistance in the RFC_K model, we assessed the feature importance [55]. We observed that SNPs with high feature importance are located within regions that are likely to have a significant role in RB disease resistance (Figs. 4(c) and (d)). Conducting further analysis on these regions could facilitate the identification and cloning of novel RB resistance genes.

3.5. The classifiers plus kinship models have high prediction accuracy for RB resistance in another independent rice population

We tested the generalizability of the RDPI-trained RFC_K, SVC_K, and lightGBM_K models in another large rice population, namely RDPII [49]. The results of the comprehensive analysis of the prediction are summarized in Table S9 in Appendix A. To evaluate the accuracy of the prediction, we conducted a spray inoculation of the RDPII accessions with RO1-1 and RB22 and determined their disease scores (Table S9). To avoid any ambiguous results, we only selected the accessions with high resistance scores (score: 0-3) and high susceptibility scores (score: 6-9) for the evaluation of the RFC_K, SVC_K, and lightGBM_K models.

The final prediction models (RFC, SVC, and lightGBM) were selected based on their highest accuracies and AUC values obtained through ten-fold CV (Table S10 in Appendix A). Then, we employed these final prediction models (RFC, SVC, and lightGBM) and their cooperating kinship models (RFC_K, SVC_K, and lightGBM_K) to predict the RB resistance (we tested two strains: RB22 and RO1-1) of RDPII (Fig. 5). In predicting the RB22 resistance, RFC_K improved the accuracy of RFC from 89.10% to 90.36%, accompanied by an increase in AUC from 0.70 to 0.73. However, SVC_K and lightGBM_K did not enhance accuracy compared with SVC and lightGBM, maintaining accuracies at 89.31% and 87.63%, respectively. The AUC increased from 0.76 to 0.78 for SVC-SVC_K comparison and from 0.71 to 0.74 for lightGBM-lightGBM_K comparison (Figs. 5(a), (d), and (f)). When predicting resistance against RO1-1, all three plus kinship models showed improvement. The accuracy increased from 90.49% (RFC) to 91.10% (RFC_K), from 89.67% (SVC) to 90.69% (SVC_K), and from 89.47% (lightGBM) to 90.01% (lightGBM_K). Additionally, the AUC experienced an increase ranging from 0.02 to 0.04 (Fig. 5(b), (d), and (f), Table S11 in Appendix A). The RB disease resistance predictions for 476 and 492 varieties are shown in the phylogenetic trees against RO1-1 and RB22, respectively, using the RFC_K model. These results were compared with the outcomes from the spray inoculation (Fig. 5(c)). Furthermore, representative susceptible and resistant rice accessions for RB strains RO1-1 and RB22 were presented in Figs. 5(g) and (h), respectively, confirming the high consistency between the prediction and spray inoculation results.

4. Discussion

Compared to traditional phenotypic selection in plant breeding, GS often uses SNPs to predict phenotypes for desirable traits. It shortens the breeding process by enabling selection prior to phenotype determination, which results in a significant reduction in economic costs and time. Additionally, GS allows for the evaluation of more breeding candidates, leading to higher selection intensity and the potential for greater genetic improvement [67], [68]. This is particularly beneficial to the improvement of disease resistance in crops; for example, a cycle of traditional breeding for RB disease resistance may require seven seasons: two seasons of phenotyping to select the best parents and five seasons of selfing after the cross. By contrast, a cycle of GS requires only two generations and could be based on genotypes [44].

ML techniques incorporating GS have been proven to improve and optimize plant breeding process. To achieve the highest prediction accuracy in GS, plant breeders experiment with multiple models. Nevertheless, the optimal model varies depending on the specific trait and crop [69]. In plant disease resistance prediction, previous studies have shown that classifiers such as RFC and SVC models outperform regression-based models when applied to maize and wheat populations [70], [71]. LightGBM, a parallel voting decision tree algorithm, has shown excellent performance in classification problems [68]. Additionally, DL-based GS methods have been introduced as effective approaches to the prediction of phenotypes by utilizing genotype matrices as input. One promising method is DNNGP, which incorporates multi-omics data to predict agronomic traits in GS. DNNGP is built upon CNN architecture [60]. DenseNet201 is another CNN-based method that introduces a dense connectivity pattern [61]. In this study, we selected RFC, SVC, lightGBM, DNNGP, and DenseNet to predict plant disease resistance. Our results demonstrated that the plus kinship models (RFC_K, SVC_K, and lightGBM_K) provide significantly improved predictive capability of GS for rice resistance against RB, RBSDV, and RSB diseases, and for wheat resistance to WB and WSR (Fig. 2, Fig. 3, Table S6). Notably, the accuracies achieved by the plus kinship models in this study surpass most previous reports for disease resistance in crops and the plus kinship models significantly enhance the prediction accuracies of the RFC, SVC, and lightGBM models. This superiority, especially better than DNNGP and DenseNet, could be attributed to the fact that these classifier models don’t need super-large breeding datasets, whereas DL models such as DNNGP and DenseNet are based on super-large datasets.

A high prediction accuracy is required for the successful application of GS. Even a small improvement in predictive ability can be converted into a significant trait gain with a strong selection intensity [22]. In addition to the choice of the GS model, the disease resistance prediction accuracies of the plus kinship models were greatly improved by adjusting the number of markers and the composition of the training set in this study. It was reported that a GWAS-based selection of SNPs increases the prediction accuracy when compared with a random SNP selection [72]. We found that GWAS-associated SNP markers have a slight effect on accuracy within a threshold interval from p ≤ 1.0 × 10-3 to p ≤  1.0 × 10-2. Strong and potentially specific linkage disequilibrium (LD) between markers results in a narrow range of variation in prediction accuracy within this interval, typically ranging from 0.001 to 0.003. This means that even a low marker density of only a few hundred to a few thousand across a genome can achieve excellent prediction accuracy in breeding populations [73]. The number of SNPs used in this study ranged from 72 to 973 at p ≤ 1.0 × 10-3 and 224 to 10 283 at p ≤ 1.0 × 10-2 (Table S6).

The determination of training set size and composition is crucial for maximizing the accuracy of GS. Previous studies have reported that larger training populations lead to higher accuracy in genomic prediction [72], [74]. However, an optimal training population size can balance prediction accuracy with genotyping and phenotyping costs. Additionally, population structure and genetic relationships can also influence the predictive accuracy of GS models [75]. To improve model performance, this study considered the genetic distance of kinship between different varieties when determining the size and composition of the training set. The plus kinship models (RFC_K, SVC_K, and lightGBM_K) showed higher prediction accuracies compared to the RFC, SVC, lightGBM models and DNNGP, DenseNet models within the threshold intervals ranging from p ≤ 1.0 × 10-3 to p ≤ 1.0 × 10-2. For RB disease prediction, the plus model achieved a 95% accuracy, while for RBDSV and RSB diseases, it achieved an 85% accuracy. Similarly, for WB and WSR diseases, the model achieved 90% and 93% accuracy. The accuracy of the plus kinship model can be affected by various factors, including the size and quality of the dataset, the selection of features and variables used for prediction, and the complexity of the disease under investigation.

We demonstrated that the plus kinship ML models (RFC_K, SVC_K, and lightGBM_K) significantly improve the prediction accuracies for rice and wheat resistance against their diseases under laboratory and field conditions. The application of ML methods for GS will accelerate the identification of new resistant resources or varieties and reduce the time and cost of phenotyping. This study represents a significant step forward in unraveling the intricate connection between genotype and disease resistance phenotype. Furthermore, it provides a reference for predicting phenotypes in other crops that possess unknown or less characterized disease-resistance genes. The integration of new technologies and data sources into ML models for GS improves their prediction accuracy. Notable examples include large-scale phenotypic data obtained through high-throughput phenotyping in controlled environments and field conditions, and the increasing affordability and accuracy of next-generation sequencing for genotyping [76], [77]. Through utilizing suitable and optimized models, cost-effective genotyping technologies, precise high-throughput phenotyping platforms, and well-designed experiments, ML in GS technology has the potential to accelerate the development of elite crop varieties with high yields and robust resistance against pathogens.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (32261143468), the National Key Research and Development (R&D) Program of China (2021YFC2600400), the Seed Industry Revitalization Project of Jiangsu Province (JBGS(2021) 001), and the Project of Zhongshan Biological Breeding Laboratory (BM2022008-02). We appreciate the initiative of the International Rice Research Institute for the establishment of the RDP2 rice variety pool and thank Drs. Bin Liu and Junliang Zhao at the Rice Research Institute, Guangdong Academy of Agricultural Sciences, China, for providing the rice seeds used in this study. We also thank Dr. Pawan K. Singh and Dr. Xinyao He at International Maize and Wheat Improvement Center for providing valuable suggestions in this study.

Compliance with ethics guidelines

Qi Liu, Shi-min Zuo, Shasha Peng, Hao Zhang, Ye Peng, Wei Li, Yehui Xiong, Runmao Lin, Zhiming Feng, Huihui Li, Jun Yang, Guo-Liang Wang, and Houxiang Kang declare that they have no conflicts of interest or financial conflicts to disclose.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.eng.2024.03.014.

References

[1]

Lee FN. Rice sheath blight: a major rice disease. Plant Dis 1983; 67(7):829.

[2]

Skamnioti P, Gurr SJ. Against the grain: safeguarding rice from rice blast disease. Trends Biotechnol 2009; 27(3):141-50.

[3]

Zhou T, Du L, Wang L, Wang Y, Gao C, Lan Y, et al. Genetic analysis and molecular mapping of QTLs for resistance to rice black-streaked dwarf disease in rice. Sci Rep 2015; 5:10509.

[4]

Schwessinger B. Fundamental wheat stripe rust research in the 21st century. New Phytol 2017; 213(4):1625-31.

[5]

Ceresini PC, Castroagudín VL, Rodrigues F, Rios JA, Aucique-Pérez CE, Moreira SI, et al. Wheat blast: past, present, and future. Annu Rev Phytopathol 2018; 56:427-56.

[6]

Chen Z, Feng Z, Kang H, Zhao J, Chen T, Li Q, et al. Identification of new resistance loci against sheath blight disease in rice through genome-wide association study. Rice Sci 2019; 26(1):21-31.

[7]

Li W, Chern M, Yin J, Wang J, Chen X. Recent advances in broad-spectrum resistance to the rice blast disease. Curr Opin Plant Biol 2019; 50:114-20.

[8]

Sun S, Zhou Y, Chen J, Shi J, Zhao H, Zhao H, et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet 2018; 50(9):1289-95.

[9]

Thomas WJW, Zhang Y, Amas JC, Cantila AY, Zandberg JD, Harvie SL, et al. Innovative advances in plant genotyping. In: Shavrukov Y, editor. Plant genotyping. Berlin: Springer; 2023. p. 451-65.

[10]

Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 2018; 557 (7703):43-9.

[11]

Burghardt LT, Young ND, Tiffin P. A guide to genome-wide association mapping in plants. Curr Protoc Plant Biol 2017; 2(1):22-38.

[12]

Cortes LT, Zhang Z, Yu J. Status and prospects of genome-wide association studies in plants. Plant Genome 2021; 14(1):e20077.

[13]

Lu J, Wang C, Zeng D, Li J, Shi X, Shi Y, et al. Genome-wide association study dissects resistance loci against bacterial blight in a diverse rice panel from the 3000 rice genomes project. Rice (N Y) 2021; 14(1):22.

[14]

Sattayachiti W, Wanchana S, Arikit S, Nubankoh P, Patarapuwadol S, Vanavichit A, et al. Genome-wide association analysis identifies resistance loci for bacterial leaf streak resistance in rice (Oryza sativa L.). Plants 2020; 9 (12):1673.

[15]

Zhang F, Zeng D, Zhang CS, Lu JL, Chen TJ, Xie JP, et al. Genome-wide association analysis of the genetic basis for sheath blight resistance in rice. Rice 2019; 12(1):93.

[16]

Long W, Yuan Z, Fan F, Dan D, Pan G, Sun H, et al. Genome-wide association analysis of resistance to rice false smut. Mol Breed 2020; 40(5):46.

[17]

Kang H, Wang Y, Peng S, Zhang Y, Xiao Y, Wang D, et al. Dissection of the genetic architecture of rice resistance to the blast fungus Magnaporthe oryzae. Mol Plant Pathol 2016; 17(6):959-72.

[18]

Liu MH, Kang H, Xu Y, Peng Y, Wang D, Gao L, et al. Genome-wide association study identifies an NLR gene that confers partial resistance to Magnaporthe oryzae in rice. Plant Biotechnol J 2020; 18(6):1376-83.

[19]

Zhu D, Kang H, Li Z, Liu M, Zhu X, Wang Y, et al. A genome-wide association study of field resistance to Magnaporthe oryzae in rice. Rice 2016; 9:44.

[20]

Xu Y, Bai L, Liu M, Liu Y, Peng S, Hu P, et al. Identification of two novel rice S genes through combination of association and transcription analyses with gene-editing technology. Plant Biotechnol J 2023; 21(8):1628-41.

[21]

Su P, Kang H, Peng Q, Wicaksono WA, Berg G, Liu Z, et al. Microbiome homeostasis on rice leaves is regulated by a precursor molecule of lignin biosynthesis. Nat Commun 2024; 15(1):23.

[22]

Xu Y, Ma K, Zhao Y, Wang X, Zhou K, Yu G, et al. Genomic selection: a breakthrough technology in rice breeding. Crop J 2021; 9(3):669-77.

[23]

Crossa J, Fritsche-Neto R, Montesinos-Lopez OA, Costa-Neto G, Dreisigacker S, Montesinos-Lopez A, et al. The modern plant breeding triangle: optimizing the use of genomics, phenomics, and environics data. Front Plant Sci 2021; 12:651480.

[24]

Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 2017;22(11):961-75.

[25]

Jeong S, Kim JY, Kim N. GMStool: GWAS-based marker selection tool for genomic prediction from genomic data. Sci Rep 2020; 10(1):19653.

[26]

Zhang Y, Zhang M, Ye J, Xu Q, Feng Y, Xu S, et al. Integrating genome-wide association study into genomic selection for the prediction of agronomic traits in rice (Oryza sativa L.). Mol Breed 2023; 43(11):81.

[27]

Wang W, Guo W, Le L, Yu J, Wu Y, Li D, et al. Integrating high-throughput phenotyping, GWAS and prediction models reveals the genetic architecture of plant height in maize. Mol Plant 2022; 16(2):354-73.

[28]

Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022; 23(1):40-55.

[29]

Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, et al. A review of deep learning applications for genomic selection. BMC Genomics 2021; 22(1):19.

[30]

Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering. Nat Methods 2019; 16(8):687-94.

[31]

Jones OT, Matin RN, van der Schaar M, Bhayankaram KP, Ranmuthu CKI, Islam MS, et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit Health 2022; 4(6):e466-76.

[32]

Najafabadi MY, Hesami M, Eskandari M. Machine learning-assisted approaches in modernized plant breeding programs. Genes 2023; 14(4):777.

[33]

Wang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. Front Plant Sci 2023; 14:1260089.

[34]

Pérez-Rodríguez P, Gianola D, González-Camacho JM, Crossa J, Manès Y, Dreisigacker S. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3Genes Genom Genet 2012; 2(12):1595-605.

[35]

Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells M. Evaluation of genomic prediction methods for fusarium head blight resistance in wheat. Plant Genome 2012; 5(2):51-61.

[36]

Xu Y, Wang X, Ding X, Zheng X, Yang Z, Xu C, et al. Genomic selection of agronomic traits in hybrid rice using an NCII population. Rice 2018; 11:32.

[37]

Yoosefzadeh-Najafabadi M, Rajcan I, Eskandari M. Optimizing genomic selection in soybean: an important improvement in agricultural genomics. Heliyon 2022; 8(11):e11873.

[38]

Sandhu KS, Lozada DN, Zhang Z, Pumphrey MO, Carter AH. Deep learning for predicting complex traits in spring wheat breeding program. Front Plant Sci 2020; 11:613325.

[39]

Ornella L, González-Camacho JM, Dreisigacker S, Crossa J. Applications of genomic selection in breeding wheat for rust resistance. In: Periyannan S, editor. Wheat rust diseases. Berlin: Springer; 2017. p. 173-82.

[40]

Arruda MP, Brown PJ, Lipka AE, Krill AM, Thurber C, Kolb FL. Genomic selection for predicting fusarium head blight resistance in a wheat breeding program. Plant Genome 2015; 8(3): plantgenome2015.01.0003.

[41]

Technow F, Bürger A, Melchinger AE. Genomic prediction of northern corn leaf blight resistance in maize with combined or separated training sets for heterotic groups. G3Genes Genom Genet 2013;3(2):197-203.

[42]

Montesinos-López OA, Montesinos-López JC, Singh P, Lozano-Ramirez N, Barrón-López A, Montesinos-López A, et al. A multivariate poisson deep learning model for genomic prediction of count data. G3Genes Genom Genet 2020; 10(11):4177-90.

[43]

Pérez-Rodríguez P, Flores-Galarza S, Vaquera-Huerta H, del Valle-Paniagua DH, Montesinos-López OA, Crossa J. Genome-based prediction of Bayesian linear and non-linear regression models for ordinal data. Plant Genome 2020; 13(2):e20021.

[44]

Huang M, Balimponya EG, Mgonja EM, McHale LK, Luzi-Kihupi A, Wang GL, et al. Use of genomic selection in breeding rice (Oryza sativa L.) for resistance to rice blast (Magnaporthe oryzae). Mol Breed 2019; 39(8):114.

[45]

Eizenga GC, Ali ML, Bryant RJ, Yeater KM, McClung AM, McCouch SR. Registration of the rice diversity panel 1 for genomewide association studies. J Plant Regist 2014; 8(1):109-16.

[46]

Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, et al. Genomewide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2011; 2:467.

[47]

Feng Z, Kang H, Li M, Zou L, Wang X, Zhao J, et al. Identification of new rice cultivars and resistance loci against rice black-streaked dwarf virus disease through genome-wide association study. Rice 2019; 12:49.

[48]

Juliana P, Poland J, Huerta-Espino J, Shrestha S, Crossa J, Crespo-Herrera L, et al. Improving grain yield, stress resilience and quality of bread wheat using largescale genomics. Nat Genet 2019; 51(10):1530-9.

[49]

McCouch SR, Wright MH, Tung CW, Maron LG, McNally KL, Fitzgerald M, et al. Open access resources for genome-wide association mapping in rice. Nat Commun 2016; 7:10532.

[50]

Zhu X, Chen S, Yang J, Zhou S, Zeng L, et al. The identification of Pi50(t), a new member of the rice blast resistance Pi2/Pi9 multigene family. Theor Appl Genet 2012; 124:1295-304.

[51]

Mgonja EM, Balimponya EG, Kang H, Bellizzi M, Park CH, Li Y, et al. Genomewide association mapping of rice resistance genes against Magnaporthe oryzae isolates from four african countries. Phytopathology 2016; 106(11):1359-65.

[52]

Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 2007; 23(19):2633-5.

[53]

Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 2016; 33(7):1870-4.

[54]

Xie J, Chen Y, Cai G, Cai R, Hu Z, Wang H. Tree visualization by one table (tvBOT): a web application for visualizing, modifying and annotating phylogenetic trees. Nucleic Acids Res 2023; 51(W1):W587-92.

[55]

Breiman L. Random forests. Mach Learn 2001; 45(1):5-32.

[56]

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12:2825-30.

[57]

Awad M, Khanna R. Support vector machines for classification. In: Awad M, Khanna R, editors. Efficient learning machines: theories, concepts, and applications for engineers and system designers. Berlin: Springer; 2015. p. 39-66.

[58]

Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 2020; 408:189-215.

[59]

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Red Hook, NY, USA. ACM Digital Library; 2017.

[60]

Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol Plant 2023; 16(1):279-93.

[61]

Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K. DenseNet: implementing efficient ConvNet descriptor pyramids. 2014. arXiv.1404.1869.

[62]

Xu W, Zhao L, Li J, Shang S, Ding X, Wang T. Detection and classification of tea buds based on deep learning. Comput Electron Agric 2022; 192:106547.

[63]

Tong H, Nikoloski Z. Machine learning approaches for crop improvement: leveraging phenotypic and genotypic big data. J Plant Physiol 2021; 257:153354.

[64]

Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006; 27 (8):861-74.

[65]

González-Camacho JM, Crossa J, Pérez-Rodríguez P, Ornella L, Gianola D. Genome-enabled prediction using probabilistic neural network classifiers. BMC Genomics 2016;17:208.

[66]

Ban Z, Yuan P, Yu F, Peng T, Zhou Q, Hu X. Machine learning predicts the functional composition of the protein corona and the cellular recognition of nanoparticles. Proc Natl Acad Sci USA 2020; 117(19):10492-9.

[67]

Liu Y, Wang D, He F, Wang J, Joshi T, Xu D. Phenotype prediction and genomewide association study using deep convolutional neural network of soybean. Front Genet 2019; 10:1091.

[68]

Qiu Z, Cheng Q, Song J, Tang Y, Ma C. Application of machine learning-based classification to genomic selection and performance improvement. In: Huang DS, Bevilacqua V, Premaratne P, editors. Intelligent computing theories and application. Berlin: Springer; 2015. p. 412-21.

[69]

Larkin DL, Lozada DN, Mason RE. Genomic selection—considerations for successful implementation in wheat breeding programs. Agronomy 2019; 9 (9):479.

[70]

Ornella L, Pérez P, Tapia E, González-Camacho JM, Burgueño J, Zhang X, et al. Genomic-enabled prediction with classification algorithms. Heredity 2014; 112(6):616-26.

[71]

González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J. Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. Plant Genome 2018; 11(2):170104.

[72]

Cericola F, Jahoor A, Orabi J, Andersen JR, Janss LL, Jensen J. Optimizing training population size and genotyping strategy for genomic prediction using association study results and pedigree information. A case of study in advanced wheat breeding lines. PLoS One 2017; 12(1):e0169606.

[73]

Abed A, Pérez-Rodríguez P, Crossa J, Belzile F. When less can be better: how can we make genomic selection more cost-effective and accurate in barley? Theor Appl Genet 2018; 131(9):1873-90.

[74]

Zhong S, Dekkers JCM, Fernando RL, Jannink JL. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 2009; 182(1):355-64.

[75]

Sarinelli JM, Murphy JP, Tyagi P, Holland JB, Johnson JW, Mergoum M, et al. Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel. Theor Appl Genet 2019; 132(4):1247-61.

[76]

Budhlakoti N, Kushwaha AK, Rai A, Chaturvedi KK, Kumar A, Pradhan AK, et al. Genomic selection: a tool for accelerating the efficiency of molecular breeding for development of climate-resilient crops. Front Genet 2022; 13:832153.

[77]

Yang W, Feng H, Zhang X, Zhang J, Doonan JH, Batchelor WD, et al. Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol Plant 2020; 13(2):187-214.

RIGHTS & PERMISSIONS

THE AUTHOR

PDF (4700KB)

12058

Accesses

0

Citation

Detail

Sections
Recommended

/