Search

CN-122024825-A - Whole genome prediction method combined with corn kernel hyperspectral data

CN122024825ACN 122024825 ACN122024825 ACN 122024825ACN-122024825-A

Abstract

The invention relates to a whole genome prediction method combining corn kernel hyperspectral data, belonging to the technical field of plant breeding. The method comprises the steps of firstly obtaining parent genome, hyperspectral data and phenotype data of hybrid seeds, carrying out various preprocessing on the hyperspectral data, screening out an optimal preprocessing method, presuming genome and hyperspectral matrix of the hybrid seeds based on the parent data, respectively constructing genome-phenotype and hyperspectral-phenotype prediction models, and further fusing and constructing hyperspectral auxiliary genome-phenotype prediction models. The ten-fold cross verification and the evaluation of a plurality of statistical methods show that the method remarkably improves the prediction precision of the characters such as the plant height, the spike length, the spike thickness, the line number, the spike number, the grain water content and the like of the corn hybrid seeds, and provides effective technical support for realizing accurate breeding of corn.

Inventors

  • XU YANG
  • LU YUE
  • CHEN RUJIA
  • TAO TIANYUN
  • YANG ZEFENG
  • XU CHENWU
  • WANG XINYI
  • YU GUANGNING
  • WANG XIN
  • ZHANG YUXIANG
  • ZHOU KAI
  • YANG WENYAN
  • JIAO YUXIN
  • LIU TAO

Assignees

  • 扬州大学

Dates

Publication Date
20260512
Application Date
20260212

Claims (9)

  1. 1. A whole genome prediction method combining hyperspectral data of corn kernels, comprising the steps of: s1, data preparation, namely acquiring genome data of parents, hyperspectral data of the seeds of the parents and phenotype data of hybrid seeds; S2, data preprocessing, namely performing quality control on the parent genome data, filtering single nucleotide polymorphism markers with minimum allele frequency lower than a threshold value and deletion rate higher than the threshold value to obtain preprocessed genome data; S3, estimating hybrid data, namely respectively estimating a genome data matrix and a hyperspectral data matrix of the hybrid based on the preprocessed genome data and the preprocessed hyperspectral data; S4, model construction and screening, namely respectively constructing a genome-phenotype prediction model and a plurality of preprocessing hyperspectral data-phenotype prediction models by adopting a cross verification method; S5, constructing and evaluating a fusion model, namely using hyperspectral data processed by the screened optimal spectrum pretreatment method to assist the pretreated genome data to construct a hyperspectral data assisted genome-phenotype prediction model, and evaluating the effect of the fusion prediction model by adopting a plurality of statistical methods.
  2. 2. The method of claim 1, wherein the plurality of spectral preprocessing methods in step S2 comprises at least two of scaling, baseline correction, scatter correction and smoothing, and more particularly comprises at least two of centering, normalization, continuum removal, baseline correction, first derivative, second derivative, standard normal transformation, multiple scatter correction, moving average and convolution smoothing.
  3. 3. The whole genome prediction method combining corn kernel hyperspectral data according to claim 1, wherein the hybrid genome data matrix in the step S3 is obtained by calculating the average value of a male parent genome matrix and a female parent genome matrix, and the hybrid hyperspectral data matrix is obtained by calculating the average value of a male parent hyperspectral data matrix and a female parent hyperspectral data matrix.
  4. 4. The method of claim 1, wherein the cross-validation method in step S4 is ten-fold cross-validation.
  5. 5. The method for whole genome prediction according to claim 4, wherein the cross-validation in step S4 is repeated a plurality of times, and the model evaluation in step S5 is repeated a plurality of times, and an average value of the plurality of results is used as a final predictive power evaluation value of the model.
  6. 6. The method according to claim 1, wherein the evaluation of the hyperspectral data-phenotype prediction model in step S4 is performed by using LASSO statistical method, and the optimal spectral pretreatment method is selected based on the evaluation.
  7. 7. The method of claim 1, wherein in step S5, the statistical method for evaluating the effect of the fusion prediction model comprises at least two of GBLUP, LASSO, RKHS, bayesB, PLS and elastic networks.
  8. 8. The method of whole genome prediction according to any one of claims 1-7, wherein the phenotypic data comprises at least one of plant height, ear length, ear thickness, row number, ear number, and grain moisture content.
  9. 9. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-8.

Description

Whole genome prediction method combined with corn kernel hyperspectral data Technical Field The invention belongs to the technical field of plant breeding, and particularly relates to a whole genome prediction method combined with corn kernel hyperspectral data. Background Whole genome selection (Genomic Selection, GS) is a core technology in the current plant breeding field, and uses high-density molecular markers covering the whole genome to predict individual breeding values by constructing a statistical model, thereby significantly accelerating genetic gain. However, the GS model mainly relies on DNA sequence variation information, and it is difficult to adequately capture complex phenotypic variations resulting from gene expression, protein modification, and interaction with the environment, and especially for complex quantitative traits of polygenic control, the prediction accuracy often suffers from bottlenecks. In recent years, hyperspectral imaging (HYPERSPECTRAL IMAGING, HSI) technology has shown great potential in agricultural phenotyping due to its ability to nondestructively and rapidly acquire spectral and spatial information of objects over a continuous narrow band. The technology can deeply reflect the physical structure, chemical composition and other phenotypic characteristics of the sample, and provides a rich data layer for understanding the 'black box' between genotypes and final visible characters. Although there have been studies attempting to introduce spectroscopic data into breeding predictions, it remains a challenge to integrate high-dimensional genomic data with high-dimensional hyperspectral data efficiently. The existing method often lacks system optimization for a fusion mode of two types of data and a preprocessing strategy of hyperspectral data, so that the information utilization rate is low, and the prediction performance is limited. Therefore, in maize hybrid breeding, there is an urgent need to develop a novel predictive method that can systematically integrate genomic and hyperspectral phenotypic data. The method needs to solve key problems of data preprocessing, information fusion, model construction and the like so as to realize more accurate and more reliable prediction of the characteristics of hybrid yield, quality and the like, thereby providing a powerful decision tool for modern accurate breeding. Disclosure of Invention The invention provides a full genome prediction method combining corn kernel hyperspectral data, which comprises the steps of firstly obtaining and preprocessing genome and hyperspectral data of parents, presuming a data matrix corresponding to hybrid seeds, constructing and comparing the genome and hyperspectral independent prediction models, screening out an optimal hyperspectral preprocessing method, and finally fusing the two types of data to construct a hyperspectral auxiliary genome prediction model. Through cross verification and evaluation by various statistical methods, the method remarkably improves the prediction precision of corn hybrid on a plurality of agronomic traits such as plant height, spike length, moisture and the like, and provides an effective tool for precise breeding. In one aspect, the invention provides a whole genome prediction method combining hyperspectral data of corn kernels, which adopts the following technical scheme: a whole genome prediction method combining hyperspectral data of corn kernels, comprising the steps of: s1, data preparation, namely acquiring genome data of parents, hyperspectral data of the seeds of the parents and phenotype data of hybrid seeds; S2, data preprocessing, namely performing quality control on parent genome data, filtering single nucleotide polymorphism markers with minimum allele frequency lower than a threshold value and deletion rate higher than the threshold value to obtain preprocessed genome data; S3, estimating hybrid data, namely respectively estimating a genome data matrix and a hyperspectral data matrix of the hybrid based on the preprocessed genome data and the preprocessed hyperspectral data; S4, model construction and screening, namely respectively constructing a genome-phenotype prediction model and a plurality of preprocessing hyperspectral data-phenotype prediction models by adopting a cross verification method; S5, constructing and evaluating a fusion model, namely using hyperspectral data processed by the screened optimal spectrum pretreatment method to assist in preprocessing genome data, constructing a hyperspectral data-assisted genome-phenotype prediction model, and evaluating the effect of the fusion prediction model by adopting a plurality of statistical methods. Preferably, the multiple spectrum preprocessing methods in the step S2 comprise at least two of scaling, baseline correction, scattering correction and smoothing, and specifically comprise at least two of centering, normalization, continuum removal, baseline correction, first derivative, second deri