CN-121999882-A - Large model construction method and system for precise breeding planning of multi-source data collaborative pears
Abstract
The invention belongs to the technical field of breeding planning, and particularly relates to a large model construction method and a large model construction system for precise breeding planning of multi-source data collaborative pears, wherein the method establishes a unified association basis of multi-dimensional data by acquiring a multi-source collaborative data set and constructing a five-dimensional association matrix; based on the method, a constraint knowledge expression structure is constructed, a constraint consistency reasoning model is trained, a user breeding instruction is analyzed in a constraint driving mode, a joint constraint condition set containing a gene feasible region and a multi-phenotype cooperative target is generated, germplasm resources are called under a cross-domain feature alignment and feature level desensitization frame, phenotype-genotype cooperative screening, parent dynamic matching and multi-generation genetic evolution deduction are executed, breeding planning simulation data are generated, and finally an optimal breeding scheme is output through a multi-dimensional evaluation system, and the model is continuously updated by using verification data. The invention realizes the unification of multi-phenotype balanced cooperative improvement and stable gene transfer, and obviously improves the breeding efficiency and variety adaptability.
Inventors
- WU JUN
- TAO ZHIQING
- WANG RUNZE
- SUN MANYI
- GU LICHUAN
- WANG QINGYONG
- Zhang Dongle
- SHI CONG
- CHEN SHULIN
Assignees
- 安徽农业大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260127
Claims (10)
- 1. The large model construction method for precise breeding planning of the multi-source data collaborative pears is characterized by comprising the following steps: obtaining the type of a target variety, the core improvement character, the yield target, the quality index and the stress resistance requirement; inputting the target variety type, the core improvement character, the yield target, the quality index and the stress resistance requirement into a breeding planning reasoning model to obtain a combined constraint condition set containing a gene feasible region and a multi-phenotype collaborative quantification target; Based on the combined constraint condition set, germplasm resources are called under a cross-domain feature alignment and feature level desensitization frame, phenotype-genotype collaborative screening, parent dynamic matching and multi-generation genetic evolution deduction are executed, and breeding planning simulation data are generated; Constructing a multidimensional evaluation system comprising gene stability, multi-phenotype cooperative consistency, environment suitability and implementation feasibility for the breeding planning simulation data, and comprehensively sequencing to obtain an optimal breeding planning scheme; the construction method of the breeding planning reasoning model comprises the following steps: constructing a five-dimensional association matrix of variety ID-multi-phenotype-whole genome-environment-knowledge based on a multi-source collaborative data set consisting of multi-phenotype quantitative data, whole genome data, environment data and domain knowledge data; And constructing a constraint knowledge expression structure based on the five-dimensional correlation matrix, introducing a constrained reasoning engine, performing constraint driving training by combining professional corpus of pear fields which cover all phenotypes in a balanced manner, and fusing a multi-modal knowledge index and a dynamic correlation retrieval mechanism to form a breeding planning reasoning model.
- 2. The method for constructing a large model for precise pear breeding planning using multi-source data according to claim 1, wherein the multi-source collaborative dataset comprises multi-phenotype quantified data, whole genome data, environmental data, and domain knowledge data, wherein each phenotype in the multi-phenotype quantified data is completely equal in data structure, call interface, and optimization status.
- 3. The large model construction method for precise pear breeding planning by utilizing multi-source data according to claim 2 is characterized in that the multi-phenotype quantitative data comprise stone cell characteristic data, fruit quality data, stress resistance data and growth characteristic data, all phenotype data are consistent in acquisition standard and equal in status, wherein the stone cell characteristic data comprise stone cell occupation area, equivalent particle size and distribution density, the fruit quality data comprise sugar content, fruit hardness and fruit shape index, the stress resistance data comprise drought resistance, disease resistance and cold resistance, the growth characteristic data comprise maturity, plant height and fruiting branch proportion, the whole genome data comprise SNP sites and InDel marks obtained by re-sequencing and candidate functional gene information related to phenotypes, and the environment data comprise annual average temperature, precipitation amount, humidity and soil organic matter content.
- 4. The method for constructing a large model for precise pear breeding planning by multi-source data cooperation according to claim 1, wherein the construction of the five-dimensional correlation matrix comprises the steps of parallelly storing phenotype fields and genotype fields by taking variety IDs as main keys, and respectively establishing bidirectional indexes of the phenotype fields and genotype variation characteristics for each phenotype field, wherein the bidirectional indexes are used for calculating the correlation strength of a gene-phenotype cooperation unit, and the same index generation rule and the same call priority are adopted for each phenotype.
- 5. The large model construction method for precise pear breeding planning by utilizing multi-source data according to claim 1 is characterized in that the constraint knowledge expression structure is a regularized structure for constraining a feasible relation boundary and at least comprises two or more of regulation and control constraint, cooperative constraint, inhibition constraint, environmental influence constraint and production area adaptation constraint.
- 6. The large model construction method for precise pear breeding planning by multi-source data collaboration of claim 1 is characterized in that constraint driving training comprises constructing a phenotype coverage consistency loss function, a gene stability transfer loss function and an environment adaptation constraint loss function which are used as training targets together, combining the consistency loss function, the gene stability transfer loss function and the environment adaptation constraint loss function into a combined loss function in a weighted summation mode, optimizing by adopting a gradient descent algorithm, realizing that a breeding planning inference model synchronously meets triple constraints of gene stability transfer, multi-phenotype equilibrium collaboration and environment adaptation, and dynamically adjusting conflicts among the loss functions through constraint knowledge expression structures in the training process.
- 7. The method for constructing a large model for precise pear breeding planning using multi-source data according to claim 1, wherein the generating of the set of joint constraints includes decomposing user breeding instructions into a set of target phenotypes, a set of target producing regions, and a set of genetic constraints, and generating a compliance threshold or a compliance interval for each target phenotype.
- 8. The method for constructing a large model for precise pear breeding planning by utilizing multi-source data according to claim 1, wherein the feature level desensitization framework comprises the steps of reserving phenotype quantification values and key mutation site information required for gene-phenotype collaborative screening, and shielding variety source units, original acquisition geographical coordinates, non-associated genome segments and identity information.
- 9. The method for constructing the large model for precise pear breeding planning by utilizing multi-source data according to claim 1, wherein the dynamic matching of parents comprises searching parent combinations by adopting a genetic algorithm by taking balanced multi-phenotype cooperative gain and stable gene transfer as fitness constraints, and the multi-generation genetic evolution deduction comprises the steps of carrying out joint updating on the transmission probability of key gene loci of each generation and the standard reaching probability of each phenotype under the conditions of hybridization, backcross and molecular marker assisted selection, and synchronously outputting the gene homozygosity trend, genetic stability score and multi-phenotype cooperative consistency score of each generation.
- 10. A large model construction system for precise breeding planning of multi-source data collaborative pears, which is used for realizing the method of any one of the claims 1-9, and is characterized by comprising the following steps: The instruction acquisition module is used for acquiring the type of the target variety, the core improvement character, the yield target, the quality index and the stress resistance requirement; the combined constraint generation module is used for inputting the type of the target variety, the core improvement character, the yield target, the quality index and the stress resistance requirement into a breeding planning reasoning model to obtain a combined constraint condition set containing a gene feasible region and a multi-phenotype collaborative quantification target; The collaborative screening and genetic deduction module is used for calling germplasm resources under a cross-domain feature alignment and feature level desensitization frame based on the combined constraint condition set, performing phenotype-genotype collaborative screening, parent dynamic matching and multi-generation genetic evolution deduction, and generating breeding planning simulation data; and the evaluation and output module is used for constructing a multidimensional evaluation system for the breeding planning simulation data and comprehensively sequencing the multidimensional evaluation system to obtain an optimal breeding planning scheme.
Description
Large model construction method and system for precise breeding planning of multi-source data collaborative pears Technical Field The invention belongs to the technical field of breeding planning, and particularly relates to a large model construction method and a large model construction system for precise breeding planning of multi-source data collaborative pears. Background Pear is an important fruit tree commercial crop in China, and variety improvement of pear plays a key role in industrial development. However, traditional breeding relies on empirical parent selection, and as multisource data (such as phenotype, genome and environmental data) are stored in a scattered manner and are different in standard, cross-dimension association analysis cannot be achieved, so that the breeding period is as long as 15-20 years, parent selection blindness is strong, and hybrid vigor is difficult to fully develop. With the application of molecular biology and artificial intelligence technology, the existing methods try to transform to design breeding, but are mostly limited to single data dimension or single phenotype optimization, and neglect multi-source synergy. For example, some techniques sacrifice data integrity for privacy preservation, or lack constraint mechanisms in large model applications, induce programming bias, and intend to identify reliance on simple part-of-speech statistics, not meeting the breeding scenario core requirements. Therefore, balanced coordination of multiple phenotypes, stable gene transfer and environment suitability are difficult to ensure uniformly, and the breeding efficiency and variety adaptability are restricted. Therefore, a multi-source data collaboration method based on constraint reasoning is needed to fundamentally solve the problem of systematic inefficiency caused by frame deletion. Disclosure of Invention In order to solve the problems in the prior art, the invention aims to provide a large model construction method and a large model construction system for precise breeding planning of multi-source data collaborative pears, which remarkably shorten the breeding period, improve the precision of parent selection and the utilization rate of heterosis, promote the precise design transformation of pear breeding from experience driving to constraint driving and multi-source collaborative, ensure stable gene transfer and enhance the adaptability of varieties to the environment of a target production area. The technical scheme of the invention is as follows: a large model construction method for precise breeding planning of multi-source data collaborative pears comprises the following steps: obtaining the type of a target variety, the core improvement character, the yield target, the quality index and the stress resistance requirement; inputting the target variety type, the core improvement character, the yield target, the quality index and the stress resistance requirement into a breeding planning reasoning model to obtain a combined constraint condition set containing a gene feasible region and a multi-phenotype collaborative quantification target; Based on the combined constraint condition set, germplasm resources are called under a cross-domain feature alignment and feature level desensitization frame, phenotype-genotype collaborative screening, parent dynamic matching and multi-generation genetic evolution deduction are executed, and breeding planning simulation data are generated; Constructing a multidimensional evaluation system comprising gene stability, multi-phenotype cooperative consistency, environment suitability and implementation feasibility for the breeding planning simulation data, and comprehensively sequencing to obtain an optimal breeding planning scheme; the construction method of the breeding planning reasoning model comprises the following steps: constructing a five-dimensional association matrix of variety ID-multi-phenotype-whole genome-environment-knowledge based on a multi-source collaborative data set consisting of multi-phenotype quantitative data, whole genome data, environment data and domain knowledge data; And constructing a constraint knowledge expression structure based on the five-dimensional correlation matrix, introducing a constrained reasoning engine, performing constraint driving training by combining professional corpus of pear fields which cover all phenotypes in a balanced manner, and fusing a multi-modal knowledge index and a dynamic correlation retrieval mechanism to form a breeding planning reasoning model. Preferably, the multi-source collaborative dataset comprises multi-phenotype quantification data, whole genome data, environmental data, and domain knowledge data, wherein each phenotype in the multi-phenotype quantification data is completely equal in data structure, call interface, and optimization status. The multi-phenotype quantitative data comprise stone cell characteristic data, fruit quality data, stress resistance data