CN-122024826-A - Pig genetic evaluation system based on multiunit science fuses
Abstract
The invention belongs to the field of data processing, and particularly relates to a live pig genetic evaluation system based on multigroup chemical fusion. The system comprises a heterogeneous data lake and warehouse integrated platform, a dynamic environment perception encoding device, a biological knowledge graph storage server, a causal graph neural network processing center, a central service cluster and edge computing terminals distributed in various farms. The platform integrates multi-source heterogeneous histology data, the coding device realizes space-time alignment of the environment and the histology data to construct a data cube, the processing center performs inverse fact inference by combining a knowledge graph, a robust causal path from genotype to phenotype is extracted from complex association, and the center cluster and the edge terminal perform model training through federal learning cooperation. According to the invention, false association caused by environmental noise is eliminated through deep fusion and causal inference of multiple groups of science and environmental data, the accuracy, robustness and interpretability of genetic evaluation are improved, and the problems of calculation pressure and privacy safety are solved.
Inventors
- MENG XIAOYUN
- FANG DUO
- WANG YABIN
- TAN CHENXI
- Wei Jindui
- YANG JINGYI
- LIU WENJUAN
- LI YANGXING
Assignees
- 深圳市金新农科技股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260407
Claims (10)
- 1. The live pig genetic evaluation system based on multi-group chemical fusion is characterized by comprising a heterogeneous data lake and warehouse integrated platform, a dynamic environment perception encoding device, a biological knowledge graph storage server, a causal graph neural network processing center, a central service cluster and edge computing terminals distributed in various farms; the heterogeneous data lake and warehouse integrated platform is used for storing and integrating multisource heterogeneous group study data of live pigs and establishing a uniform storage structure with maintained semantics; The dynamic environment perception coding device is used for collecting environment parameters of a farm and equipment running state data, generating an environment event sequence matched with the histology data through a space-time alignment algorithm, and constructing a data cube; The biological knowledge graph storage server is used for maintaining a knowledge correlation network comprising biomolecule nodes and regulation and control channels thereof and providing priori biological hierarchical structure support for genetic evaluation; The causal graph neural network processing center is used for mapping node attributes in the data cube to the knowledge correlation network, and extracting causal paths from genotypes to phenotypes by performing inverse fact inference and causal pruning operations; The central service cluster is used for storing global shared model parameters, executing cross-regional general causal mode learning and issuing global control instructions; The edge computing terminal is used for receiving the causal skeleton issued by the central service cluster, performing model personalized fine adjustment by utilizing real-time data of the local farm, and feeding back updated parameters to the central service cluster.
- 2. The live pig genetic evaluation system based on multi-group chemical fusion according to claim 1, wherein the heterogeneous data lake and warehouse integrated platform adopts a hybrid storage architecture combining a distributed file system and a relational database in a physical layer, and the logic architecture is divided into an original data layer, a metadata index layer and a semantic keeping layer; The raw data layer is used for receiving and storing raw genome sequencing files, transcriptome expression profile matrixes, apparent group methylation distribution data and metabolome substance concentration data of live pigs; the metadata index layer is used for establishing a label system comprising sampling time, individual identification codes, tissue sources and sequencing platform parameters for various groups of data so as to realize cross-mode retrieval; The semantic keeping layer is used for mapping the group chemical features with different dimensions to a unified mathematical feature space through a preset data dictionary and an ontology mapping rule, so that the consistency of biological significance of the data with different group chemical levels in fusion calculation is ensured; the heterogeneous data lake and bin integrated platform is further integrated with a data cleaning unit, the data cleaning unit is used for controlling quality of collected multiple groups of chemical data, removing abnormal samples with insufficient sequencing depth or signal-to-noise ratio lower than a preset threshold value, and utilizing an interpolation algorithm to complement missing group chemical characteristic values.
- 3. The genetic evaluation system for live pigs based on multi-group chemical fusion according to claim 2, wherein the dynamic environment perception encoding device comprises a sensor array deployed in a breeding house and an edge data gateway connected with the sensor array; The sensor array is used for collecting environmental parameters including temperature, humidity, ammonia concentration, carbon dioxide concentration and illumination intensity; The equipment operation state data are obtained through connection of the edge data gateway and an automatic feeding system, an automatic drinking water system and an industrial control bus of the manure cleaning robot in the farm, and cover the blanking frequency of the automatic feeder, the instantaneous flow of the drinking water equipment and the operation period of the manure cleaning system; The dynamic environment perception encoding device is internally integrated with a space-time alignment module, and the space-time alignment module is used for executing a frequency matching algorithm and performing space-time dimensional alignment on the environment parameters of high-frequency sampling and the group learning data of low-frequency sampling; The space-time alignment module calculates an environmental accumulation effect and a fluctuation standard deviation in a preset time period before the acquisition of the histology sample through a sliding window technology, converts the environmental accumulation effect and the fluctuation standard deviation into a semantic environmental event sequence, and constructs a data cube integrating phenotype, histology and environment three-dimension through giving semantic weight to environmental variables.
- 4. The genetic evaluation system for live pigs based on multi-group chemical fusion according to claim 3, wherein the knowledge correlation network stored in the biological knowledge graph storage server comprises gene nodes, transcript nodes, protein nodes, metabolite nodes and regulation and control relations among the nodes; the regulatory relationships encompass transcriptional regulation, protein interactions, enzymatic reaction pathways, and signaling pathways; the biological knowledge graph storage server covers a growth hormone regulating path, an immunoreaction signal transmission path and a feed conversion metabolism network which are related to the production traits of live pigs, and each path node is provided with a corresponding biological function annotation; The biological knowledge graph storage server is configured with a dynamic updating mechanism, periodically grabs the latest pig scientific research results and gene function information from a public biological information database through a data interface, automatically updates the logic connection relation between nodes, and outputs the logic connection relation as a causal framework to the causal graph neural network processing center so as to provide biological logic constraint for the learning process of a neural network.
- 5. The live pig genetic evaluation system based on multi-group chemical fusion according to claim 4, wherein the causal graph neural network processing center comprises a graph structure encoder, a causal inference module and a causal pruning module; The map structure encoder is used for mapping a plurality of groups of chemical features serving as node attributes into a topological structure of the biological knowledge graph, mapping single nucleotide polymorphism sites to corresponding gene nodes, and encoding the single nucleotide polymorphism sites into sub-map feature vectors with a hierarchical structure by combining transcriptome expression quantity and protein abundance; The causal inference module is used for executing a deep learning operation based on a counter fact frame, taking the environmental burst fluctuation captured by the dynamic environment perception encoding device as an external intervention variable, and calculating the intervention effect of the environment on the histology characteristics by comparing the actual observed histology expression value with the counter fact inference value under the condition that the assumed environment is not disturbed; The causal pruning module is used for identifying and eliminating false association paths generated by environmental noise or non-hereditary apparent modification in the information transmission process of the neural network according to the intensity of the intervention effect, so that the core causal characteristics with hereditary stability are locked.
- 6. The live pig genetic evaluation system based on multi-group chemical fusion according to claim 5, wherein the central service cluster comprises a global model repository, a federal aggregation server and a causal path verification module; The global model storage library is used for storing global causal graph neural network parameters and global unified biological knowledge maps after multi-field data training; the federation aggregation server is used for running a federation average algorithm, receiving model parameter update gradients or causal path statistical summaries uploaded by the edge computing terminals on the premise of not contacting the original privacy data of each farm, and updating the weight coefficient of the global model by executing weighted average operation on the update gradients; The causal path checking module is used for monitoring generalization performance of the global model under different environmental backgrounds, triggering a structure optimization flow of the global knowledge graph and issuing a personalized fine adjustment instruction to a corresponding edge computing terminal if detecting that a prediction error of the global model in a specific region exceeds a preset tolerance threshold.
- 7. The live pig genetic evaluation system based on multi-group learning fusion of claim 6, wherein the edge computing terminal is provided with a lightweight neural network reasoning engine, an adaptive learning unit and a data privacy protection module; the lightweight neural network reasoning engine is used for receiving the global causal skeleton issued by the central service cluster, and outputting a quasi-real-time evaluation result of the genetic value of the live pigs by combining real-time environment data and histology data of the local farm; The evaluation result comprises an evaluation breeding value and a causal contribution degree analysis report indicating a genetic locus to a production performance influence path; The self-adaptive learning unit is used for carrying out small sample learning and super-parameter adjustment on the global model according to the genetic background difference of the local pig group so as to capture the interaction effect of the specific genes and the environment in the local environment; The data privacy protection module is used for carrying out noise adding processing or encryption operation on the update gradient before uploading parameters, so that the individual phenotype data of the farm cannot be reversely restored in the transmission process.
- 8. The pig genetic evaluation system based on multi-group chemical fusion of claim 7, further comprising a visual interaction interface and an early warning module; The visual interaction interface is used for displaying a genetic potential evaluation distribution map of the whole live pigs and providing an interactable causal path topology view, so that a user can track molecular regulation logic behind production traits by clicking view nodes and observe the expression robustness of key genes under different environmental pressures; The early warning module is used for monitoring fluctuation conditions of genetic evaluation values in real time, and automatically sending risk prompt information to a mobile terminal of a manager when abnormal decline of evaluation values of live pigs is detected, and a causal path output by the causal graph neural network processing center points to strong environmental interference or disease risk, and synchronously displaying a suggested cultivation adjustment strategy.
- 9. The live pig genetic evaluation system based on multi-group chemical fusion according to claim 8, wherein the causal graph neural network processing center is further provided with a methylation site sensitive module, the methylation site sensitive module is used for processing apparent group data, and the analysis sequencing result is used for distinguishing reversible methylation modification induced by a short-term environment from long-term epigenetic variation with cross-generation genetic potential; The causal graph neural network processing center gives different weight coefficients to the apparent signals with two different properties according to the distinguishing result of the methylation site sensitive module so as to eliminate the temporary phenotype preference induced by the environment and improve the reliability of genetic breeding; In addition, the causal graph neural network processing center adopts a multi-task learning framework for simultaneously carrying out parallel causal modeling on reproductive traits, meat quality traits, disease resistance traits and body type traits, and capturing genetic correlation among different traits and a common regulation mechanism of the genetic correlation at a molecular level by sharing a bottom causal framework.
- 10. The live pig genetic evaluation system based on multi-group chemical fusion according to claim 9, wherein the system adopts a distributed cooperation architecture, decomposes a large-scale genetic evaluation operation task into a plurality of parallel subtasks and is completed by cooperation of distributed edge nodes and a central cluster; The system is also provided with an automatic breeding decision optimization engine, which is used for automatically planning breeding paths, introduction plans and elimination standards of multiple generations by taking a genetic evaluation result, an inbreeding coefficient control requirement and market demand prediction as input variables through a reinforcement learning algorithm; The edge computing terminal is further connected with an automatic weighing device and a behavior monitoring camera, body type parameters and activity data of the live pigs are automatically obtained through a machine vision algorithm, and the body type parameters and the activity data are fed back to the heterogeneous data lake and warehouse integrated platform to be used as supplement of phenotype layer data.
Description
Pig genetic evaluation system based on multiunit science fuses Technical Field The invention belongs to the field of data processing, and particularly relates to a live pig genetic evaluation system based on multigroup chemical fusion. Background Along with the transition of modern animal husbandry to precision and intellectualization, the genetic evaluation of live pigs serves as a core link of a live pig breeding system, and has important significance for improving breeding efficiency of breeding pigs and optimizing genetic progress of groups. Through scientific evaluation of the growth and development, disease resistance, reproductive performance and other characters of the live pigs, core data support can be provided for breeding decisions. Under the background of popularization of genomics technology, deep analysis of large-scale genetic data by utilizing a bioinformatics means has become a key technical path for improving the competitiveness of the pig industry and guaranteeing the safety of meat foods. Genetic evaluation technology based on multiunit science fusion is the research focus in the field of current zootechnics. The technology aims at integrating multidimensional biological data such as genome, transcriptome, apparent group, metabolome and the like and constructing a complex association model from microscopic molecular variation to macroscopic production phenotype. Through deep mining of the synergistic effect among different levels of histology data, the technology aims to break through the limitations of traditional single pedigree or genome selection in analyzing complex characters, more comprehensively delineate the potential genetic value of live pigs and guide accurate breeding. Existing genetic evaluation techniques face problems in processing multiple sets of mathematical data. The traditional evaluation system depends on genetic information with single dimension, and evaluation accuracy and large-scale operation efficiency are difficult to be considered, so that the fusion process of multiple groups of chemical data is often accompanied with exponential increase of calculation complexity, and the requirement of real-time evaluation cannot be met. The existing model generally lacks dynamic perception capability of gene-environment interaction in a complex cultivation environment, is seriously interfered by environmental noise and equipment errors, and has insufficient robustness when deployed in different environmental scenes. The traditional black box model often ignores the biological hierarchical structure in the histology data, has a biological semantic gap, is difficult to accurately distinguish the statistical surface correlation from biological causal association, and is easy to be influenced by non-genetic apparent modification to mislead the breeding direction. Disclosure of Invention The invention aims to provide a pig genetic evaluation system based on multi-group chemical fusion, which can solve the problems in the background technology. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: The utility model provides a live pig genetic evaluation system based on multiunit science fuses, includes integrative platform of heterogeneous data lake storehouse, dynamic environment perception encoding device, biological knowledge map storage server, causal map neural network processing center, central service cluster and distributes in the marginal computing terminal of each plant, as follows: The heterogeneous data lake and warehouse integrated platform is used for storing and integrating multi-source heterogeneous data, wherein the multi-source heterogeneous data comprises genome data, transcriptome data, apparent group data and metabolome data of live pigs, and a uniform storage structure with maintained semantics is established; the dynamic environment perception coding device is used for collecting environment parameters and equipment running state data of a farm in real time, carrying out semantic annotation on the environment parameters and the equipment running state data through a space-time alignment algorithm, generating an environment event sequence matched with the histology data, and constructing a data cube integrating phenotype, histology and environment three dimensions; the biological knowledge graph storage server is used for pre-constructing and maintaining a knowledge correlation network containing genes, transcripts, proteins, metabolites and known biological regulation and control paths and providing a priori biological hierarchical structure support for genetic evaluation; The causal graph neural network processing center is used for mapping node attributes in the data cube into a network structure provided by the biological knowledge graph storage server, and extracting a steady causal path from genotype to phenotype from complex histology association by performing inverse fact inference and causal pruning o