CN-122024837-A - Method and system for deducing chicken gene-pathogen interaction network by utilizing graphic neural network
Abstract
The invention provides a method and a system for deducing a chicken gene-pathogen interaction network by utilizing a graphic neural network, which relate to the technical field of data processing and comprise the steps of firstly obtaining multiple groups of original data of chicken individuals under the same time scale, mapping gene and pathogen sequences to a unified topological space through cross-species unified graphic structure construction processing and constructing an initial interaction map based on time sequence co-occurrence, combining continuous phenotype data and genetic pedigree information, carrying out dynamic map representation learning under genetic constraint to obtain a map representation with space-time characteristics and genetic generalization capability, screening out candidate interaction through multi-scale topological structure optimization processing, and finally automatically identifying and retaining connection critical to network reconstruction based on a map self-coding discrimination structure with maximized interaction information to generate an interaction network. The invention realizes the deduction of the gene-pathogen interaction relation with high robustness and biological interpretability from the multidimensional dynamic data.
Inventors
- TANG SHI
- YIN HUADONG
- XUE FUGUANG
- CHEN BINLONG
- SUN CAIYUN
- HU FUXIANG
- WANG SINI
- CHEN YUHAN
- DENG MIN
Assignees
- 西昌学院
Dates
- Publication Date
- 20260512
- Application Date
- 20260410
Claims (10)
- 1. A method for inferring chicken gene-pathogen interaction networks using a graph neural network, comprising: Acquiring original transcript sequencing reading data of all chicken individuals at a plurality of continuous time points, pathogen whole genome original sequencing reading data of corresponding chicken individuals at the same time points, continuous phenotype record original data of the chicken individuals at all time points and original record data of blood relationship among the chicken individuals, wherein the original transcript sequencing reading data of all chicken individuals at a plurality of continuous time points are acquired under the same preset time scale; Based on the obtained original transcript sequencing read data and pathogen whole genome original sequencing read data, performing cross-species unified map structure construction processing to obtain an initial gene-pathogen interaction composition of a fusion time sequence signal; based on the initial gene-pathogen interaction composition, continuous phenotype record original data and original record data of blood relationship among chicken individuals, carrying out dynamic graph representation learning processing under genetic constraint to obtain a space-time graph representation with genetic generalization capability; performing multi-scale topological structure optimization processing according to the space-time diagram representation to obtain a candidate interaction network of chicken genes-pathogens; And carrying out saliency network generation processing according to the candidate interaction network, automatically identifying and reserving edge connection with the importance of the response mode of the reconstructed whole network being greater than a preset threshold value based on a preset graph self-coding discrimination structure with maximized mutual information, and obtaining the finally inferred chicken gene-pathogen interaction network.
- 2. The method of using a graph neural network to infer a chicken gene-pathogen interaction network according to claim 1, wherein sequencing read data based on the obtained raw transcript and raw sequencing read data of the whole genome of the pathogen comprises: performing cross-species sequence fragment numerical characterization processing according to the original transcript sequencing read data and pathogen whole genome original sequencing read data, wherein statistical feature vectors are respectively constructed for chicken gene transcript fragments and pathogen genome fragments to obtain preliminary numerical representation of sequence fragments, and the statistical feature vectors are constructed based on corresponding sequence compositions and read coverage depth; carrying out uniform projection processing in an isomerism space according to the preliminary numerical representation of the sequence segment, carrying out normalization processing on segment feature vectors from chickens and pathogens, and mapping the segment feature vectors to a public hidden space to obtain uniform node embedding of the chickens and the pathogens; and carrying out edge connection initialization processing based on time lag co-occurrence according to the unified node embedding and the continuous time point information, calculating time lag mutual information of each node embedded on a cross-sample time sequence in a preset sliding time window, and dynamically constructing an initial edge reflecting a response time sequence relation according to the time lag mutual information to obtain an initial gene-pathogen interaction composition of a fusion time sequence signal.
- 3. The method for deriving a chicken gene-pathogen interaction network using a graphic neural network according to claim 1, wherein performing a dynamic graphic representation learning process under genetic constraint based on the initial gene-pathogen interaction composition, the continuous phenotype recording raw data, and raw recording data of blood relationship between chicken individuals to obtain a space-time graphic representation with genetic generalization capability comprises: Carrying out node state time sequence alignment processing according to the initial gene-pathogen interaction composition and the continuous phenotype record original data, and carrying out associated modeling on a graph node state vector of each time point and a multidimensional phenotype observation value of a corresponding chicken individual at the time point, wherein a node state sequence synchronously calibrated with phenotype dynamics is obtained based on a preset cyclic neural network coding phenotype time sequence and a time sequence evolution path of a reversely calibrated node state; Carrying out subgraph structure constraint processing based on genetic similarity according to the node state sequence and the original record data of the blood-margin relationship among the chicken individuals, constructing a genetic distance matrix by analyzing the blood-margin relationship, and constructing positive and negative sample pairs among the graph data of a plurality of chicken individuals based on the genetic distance matrix to obtain a node and edge representation set meeting genetic consistency constraint; And carrying out space-time information fusion processing according to the node and the edge representation set, aggregating the historical state of the node in the time dimension through a preset network structure of fusion gating sequential convolution and graph annotation meaning force mechanism, and aggregating neighbor information according to updated edge weights in the space dimension to obtain a space-time graph representation with genetic generalization capability.
- 4. The method for deducing chicken gene-pathogen interaction networks using a graph neural network according to claim 1, wherein performing a multi-scale topological structure optimization process according to the space-time graph representation to obtain candidate chicken gene-pathogen interaction networks comprises: Performing local direct interaction intensity evaluation processing according to the space-time diagram representation with genetic generalization capability, quantifying importance weight of each side when local neighborhood information is aggregated through a preset attention network, and screening out core sides forming a stable local cluster structure according to the importance weight to obtain an enhanced local interaction subgraph; performing high-order path dependency relation mining processing according to the enhanced local interaction subgraph, capturing and evaluating potential regulation or synergy effects generated among nodes through a plurality of indirect paths through the attention network, and obtaining a high-order relation graph reflecting global topological relevance; And carrying out stable core connection mode fusion extraction processing on the contained cross-time and cross-sample information according to the high-order relation graph and the space-time graph, calculating the occurrence consistency of the connection relation in different time and different genetic background sample subsets, and fusing and screening the local interaction side and the high-order association side based on the calculated consistency to obtain the candidate interaction network of chicken genes and pathogens.
- 5. The method of using a graph neural network to infer a chicken gene-pathogen interaction network according to claim 1, wherein performing a saliency network generation process according to the candidate interaction network comprises: performing edge connection importance quantization processing according to the candidate interaction network, and obtaining an edge set with initial significance weight by calculating contribution degree of each edge in the network to the overall network representation under the mutual information maximization target; Performing cross-condition stability-based edge recalibration treatment according to the edge set with the initial significance weight, and screening edges with the variability less than a preset threshold value in different time windows and different genetic backgrounds by counting the variability of weights in different time windows and different genetic background sample subsets of each edge to obtain a core edge subset; And carrying out network generation processing facing to biological pathway consistency according to the core edge subset, carrying out iterative optimization by taking reconstruction of core edge connection as a target through a preset graph self-encoder structure and taking a preset chicken immunity or metabolism pathway as sparsity constraint, and finally obtaining a finally inferred chicken gene-pathogen interaction network by only retaining connection conforming to a priori pathway structure.
- 6. A system for inferring chicken gene-pathogen interaction networks using a graph neural network, comprising: the acquisition unit is used for acquiring original transcript sequencing read-out data of all chicken individuals at a plurality of continuous time points, pathogen whole genome original sequencing read-out data of corresponding chicken individuals at the same time points, continuous phenotype record original data of the chicken individuals at all time points and original record data of blood margin relations among the chicken individuals, wherein the original transcript sequencing read-out data is acquired at the same preset time scale; The construction unit is used for carrying out cross-species unified diagram structure construction processing based on the acquired original transcript sequencing read data and pathogen whole genome original sequencing read data to obtain an initial gene-pathogen interaction composition of the fusion time sequence signal; The learning unit is used for carrying out dynamic graph representation learning processing under genetic constraint based on the initial gene-pathogen interaction composition, the continuous phenotype record original data and the original record data of the blood relationship among chicken individuals to obtain a time-space graph representation with genetic generalization capability; The optimizing unit is used for carrying out multi-scale topological structure optimizing treatment according to the space-time diagram representation to obtain a candidate interaction network of chicken genes-pathogens; The generation unit is used for carrying out saliency network generation processing according to the candidate interaction network, automatically identifying and reserving edge connection with the importance of the reconstructed overall network response mode larger than a preset threshold value based on a preset graph self-coding discrimination structure with the maximized mutual information, and obtaining the finally inferred chicken gene-pathogen interaction network.
- 7. The system for deriving chicken gene-pathogen interaction network using a graph neural network according to claim 6, wherein the construction unit comprises: The first construction subunit is used for carrying out numeric characterization processing on cross-species sequence fragments according to the original transcript sequencing read data and pathogen whole genome original sequencing read data, wherein a statistical feature vector is respectively constructed for chicken gene transcript fragments and pathogen genome fragments to obtain preliminary numeric representation of the sequence fragments, and the statistical feature vector is constructed based on the corresponding sequence composition and the read coverage depth; The second construction subunit is used for carrying out heterogeneous space unified projection processing according to the preliminary numerical representation of the sequence segment, carrying out normalization processing on segment feature vectors from the chicken and the pathogen, and mapping the segment feature vectors to a public hidden space to obtain unified node embedding of the chicken and the pathogen; And the third construction subunit is used for carrying out edge connection initialization processing based on time lag co-occurrence according to the unified node embedding and the continuous time point information, calculating time lag mutual information of each node embedded on a cross-sample time sequence in a preset sliding time window, and dynamically constructing an initial edge reflecting a response time sequence relation according to the time lag mutual information to obtain an initial gene-pathogen interaction composition of a fusion time sequence signal.
- 8. The system for inferring chicken gene-pathogen interaction networks using a graph neural network of claim 6, wherein the learning unit comprises: The first learning subunit is used for carrying out node state time sequence alignment processing according to the initial gene-pathogen interaction composition and the continuous phenotype record original data, and carrying out associated modeling on the graph node state vector of each time point and the multidimensional phenotype observation value of the corresponding chicken individual at the time point, wherein a node state sequence synchronously calibrated with phenotype dynamics is obtained based on a preset cyclic neural network coding phenotype time sequence and reversely calibrating a time sequence evolution path of the node state; The second learning subunit is used for carrying out subgraph structure constraint processing based on genetic similarity according to the node state sequence and the original record data of the blood-margin relationship among the chicken individuals, constructing a genetic distance matrix by analyzing the blood-margin relationship, and constructing positive and negative sample pairs among the image data of a plurality of chicken individuals based on the genetic distance matrix to obtain a node and edge representation set meeting genetic consistency constraint; And the third learning subunit is used for carrying out space-time information fusion processing according to the node and the edge representation set, aggregating the node history state in the time dimension through a preset network structure of fusion gating time sequence convolution and graph injection force mechanism, and aggregating neighbor information according to the updated edge weight in the space dimension to obtain the space-time graph representation with genetic generalization capability.
- 9. The system for inferring chicken gene-pathogen interaction networks using a graph neural network of claim 6, wherein the optimizing unit comprises: The first optimizing subunit is used for carrying out local direct interaction strength evaluation processing according to the space-time diagram representation with genetic generalization capability, quantifying the importance weight of each side when local neighborhood information is aggregated through a preset attention network, and screening out core sides forming a stable local cluster structure according to the importance weight, so as to obtain an enhanced local interaction subgraph; The second optimizing subunit is used for carrying out high-order path dependency relation mining processing according to the enhanced local interaction subgraph, capturing and evaluating potential regulation or synergistic effects generated by a plurality of indirect paths among nodes through the attention network, and obtaining a high-order relation graph reflecting global topological relevance; And the third optimizing subunit is used for carrying out stable core connection mode fusion extraction processing according to the high-order relation diagram and the space-time diagram representation, calculating the occurrence consistency of the connection relation in different time and different genetic background sample subsets, and carrying out fusion and screening on the local interaction side and the high-order association side based on the calculated connection relation to obtain the candidate interaction network of chicken gene-pathogen.
- 10. The system for deriving chicken gene-pathogen interaction network using a graph neural network according to claim 6, wherein the generating unit comprises: the first generation subunit is used for carrying out edge connection importance quantization processing according to the candidate interaction network, and obtaining an edge set with initial significance weight by calculating the contribution degree of each edge in the network to the whole network representation under the mutual information maximization target; The second generation subunit is used for carrying out edge recalibration processing based on cross-condition stability according to the edge set with the initial significance weight, and screening edges with the variability degree smaller than a preset threshold value in different time windows and different genetic backgrounds by counting the variability degree of the weights in different time windows and different genetic background sample subsets of each edge to obtain a core edge subset; And the third generation subunit is used for carrying out network generation processing for the consistency of biological paths according to the core edge subset, carrying out iterative optimization by taking reconstruction of core edge connection as a target through a preset graph self-encoder structure and taking a preset chicken immunization or metabolism path as sparsity constraint, and finally obtaining a finally inferred chicken gene-pathogen interaction network by only retaining connection conforming to a priori path structure.
Description
Method and system for deducing chicken gene-pathogen interaction network by utilizing graphic neural network Technical Field The invention relates to the technical field of data processing, in particular to a method and a system for deducing a chicken gene-pathogen interaction network by using a graph neural network. Background The use of high throughput sequencing data to resolve the interaction network between host and pathogen has become an important direction for studying disease mechanisms and guiding disease-resistant breeding. In particular, in large-scale poultry cultivation, understanding the dynamic interaction relationship between chicken genes and pathogens has a key meaning for epidemic prevention and control and genetic improvement. This scenario involves multiple sets of genetic data collected continuously from chicken individuals, including gene transcripts, pathogenic genomes, dynamic phenotypes, and genetic lineage information, and is characterized by multiple complexities of high dimension, heterogeneity, strong temporal associations, and significant individual genetic differences in nature. Traditional research methods generally process data from different sources step by step, for example, independently constructing a gene co-expression network or a pathogenic phylogenetic tree, and then performing association matching based on statistical correlation or an existing biological knowledge base. Such methods have difficulty in deeply fusing the cooperative variation patterns in the time sequence dimension, and lack of an effective mechanism to distinguish signals generated by stable biological interactions from interference introduced by random noise or a specific individual genetic background, so that the finally inferred interaction network tends to be poor in stability and limited in generalization capability, and core interaction rules ubiquitous in different genetic backgrounds are difficult to reveal. Therefore, how to automatically learn and extract a gene-pathogen cross-species interaction network with genetic robustness and time sequence consistency from original multi-dimensional dynamic data becomes a key technical problem to be solved urgently in the field. Disclosure of Invention The present invention aims to provide a method and a system for deducing chicken gene-pathogen interaction network by using a graphic neural network, so as to solve the above problems. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: In a first aspect, the application provides a method for inferring a chicken gene-pathogen interaction network using a graph neural network, comprising: Acquiring original transcript sequencing reading data of all chicken individuals at a plurality of continuous time points, pathogen whole genome original sequencing reading data of corresponding chicken individuals at the same time points, continuous phenotype record original data of the chicken individuals at all time points and original record data of blood relationship among the chicken individuals, wherein the original transcript sequencing reading data of all chicken individuals at a plurality of continuous time points are acquired under the same preset time scale; Based on the obtained original transcript sequencing read data and pathogen whole genome original sequencing read data, performing cross-species unified map structure construction processing to obtain an initial gene-pathogen interaction composition of a fusion time sequence signal; based on the initial gene-pathogen interaction composition, continuous phenotype record original data and original record data of blood relationship among chicken individuals, carrying out dynamic graph representation learning processing under genetic constraint to obtain a space-time graph representation with genetic generalization capability; performing multi-scale topological structure optimization processing according to the space-time diagram representation to obtain a candidate interaction network of chicken genes-pathogens; And carrying out saliency network generation processing according to the candidate interaction network, automatically identifying and reserving edge connection with the importance of the response mode of the reconstructed whole network being greater than a preset threshold value based on a preset graph self-coding discrimination structure with maximized mutual information, and obtaining the finally inferred chicken gene-pathogen interaction network. In a second aspect, the present application also provides a system for inferring a chicken gene-pathogen interaction network using a graph neural network, comprising: the acquisition unit is used for acquiring original transcript sequencing read-out data of all chicken individuals at a plurality of continuous time points, pathogen whole genome original sequencing read-out data of corresponding chicken individuals at the same time points, continuous phenoty