CN-121768485-B - High-order interaction prediction method and device with hybrid image deep learning

CN121768485BCN 121768485 BCN121768485 BCN 121768485BCN-121768485-B

Abstract

The invention provides a high-order interactive prediction method and a device with mixed graph deep learning, which are based on multi-source heterogeneous data such as a medicine molecular structure, microorganism classification information, a disease semantic network and the like, constructing a medicine molecular graph, a microorganism weighted graph, a disease weighted graph and a hypergraph connecting the three to form a mixed graph structure. And then, extracting nonlinear structural features and high-order interaction features of each entity by a mixed graph deep learning module fusing a graph convolution network and a hypergraph neural network, and realizing self-adaptive fusion of the features by using a attention mechanism. And then, mapping the fused deep features into priori expectations of potential factor matrixes in the Bayesian logic tensor decomposition model, constructing a probability map model, and carrying out joint self-adaptive inference on model parameters, latent variables and deep learning mapping through a variation expectation maximization algorithm, so that high-order associated probability prediction on a full tensor space is realized under the condition of no negative sampling.

Inventors

Zhong Junjiang
MA YINGJUN
Hu Xianggao

Assignees

厦门理工学院

Dates

Publication Date: 20260512
Application Date: 20260228

Claims (7)

1. A high-order interactive prediction method with hybrid map deep learning, comprising: Extracting medicine data, microorganism data, disease data and auxiliary information from a plurality of preset databases, and constructing a mixed graph structure and an associated tensor based on the extracted data; carrying out feature extraction on the mixed graph structure by adopting a graph convolution network and a hypergraph convolution network, introducing corresponding attention factors and deep learning mapping, calculating priori expectation and multi-element Gaussian distribution, calculating joint likelihood probability based on a correlation tensor, and combining the joint likelihood probability and the multi-element Gaussian distribution to obtain a joint probability model; E-step is executed, and posterior expectation and covariance matrix of the potential factor matrix are deduced in combination with the associated tensor under the condition of fixed prior expectation; Under the condition of fixing posterior expectation and covariance matrixes, updating the joint probability model parameters and the deep learning mapping by using an Adam optimizer, and alternately executing the E-step and the M-step until the optimal balance is achieved, obtaining the optimal posterior distribution of all potential factor matrixes, and extracting the final posterior expectation value of each potential factor matrix as the final joint probability model to be output; Calculating any one drug-microorganism-disease triplet according to the final posterior expected value to obtain a predicted association probability, and performing descending order arrangement on the predicted association probability to generate a recommendation list; Extracting medicine data, microorganism data, disease data and auxiliary information from a plurality of preset databases, and constructing a mixed graph structure and an associated tensor based on the extracted data, wherein the specific steps are as follows: extracting medicine data, microorganism data and disease data and auxiliary information from a plurality of preset databases; Converting SMILES character string in medicine data to obtain medicine molecular diagram , wherein, Is an atomic attribute matrix, R is a characteristic dimension, n is an atomic number, Being an atomic attribute dimension, A is an adjacency matrix; respectively constructing microorganism weighted graphs based on microorganism data and disease data And disease weighted graph , wherein, Is the top point of the microorganism, and the top point of the microorganism is the top point of the microorganism, Is the side of the microorganism which is the side of the microorganism, Is the weight corresponding to the edge of the microorganism, As the apex of the disease is the point of the disease, Is the side of the disease which is to be treated, Weights corresponding to edges of the disease; Constructing DMD hypergraph according to medicine data, microorganism data and disease data and auxiliary information , Is a vertex set, which contains all drug data, microbiological data and disease data, Is a superedge set; Constructing a mixed graph structure according to the medicine molecular graph, the microorganism weighted graph, the disease weighted graph and the DMD hypergraph, and defining a hypergraph incidence matrix Wherein when the v-th vertex is in the e-th superside, the corresponding supergraph incidence matrix When the v-th vertex is not in the e-th superside, the corresponding supergraph incidence matrix , , ; Establishing drug-microorganism-disease associated tensor Wherein I is the number of medicines, J is the number of microorganisms, and K is the number of diseases.
2. The high-order interaction prediction method with mixed graph deep learning according to claim 1, wherein the mixed graph structure is extracted by adopting a graph convolution network and a hypergraph convolution network, corresponding attention factors and deep learning mapping are introduced, priori expectation and multi-element Gaussian distribution are calculated, and joint likelihood probability is calculated based on a correlation tensor, and the joint likelihood probability and the multi-element Gaussian distribution are combined to obtain a joint probability model, which is specifically: for a microorganism weighted graph and a disease weighted graph, adopting multi-layer GCN propagation to obtain propagation characteristics, wherein the calculation formula of the propagation characteristics of the t+1th layer is as follows: , in the form of a degree matrix, Is a matrix of learnable parameters for the t-th layer, As a propagation characteristic of the t-th layer, As a function of the non-linear activation, Is a phylogenetic similarity matrix of microorganisms; applying a graph rolling network on a medicine molecular graph, carrying out global maximum pooling treatment, and aggregating atomic-level features into feature vectors of a medicine layer, and combining the feature vectors and propagation features to obtain GCN features; and applying a hypergraph convolutional network on the DMD hypergraph, executing information propagation of vertex-hyperedge-vertex, and extracting to obtain the characteristics of the hypergraph convolutional network, wherein the propagation rule is as follows: , is a matrix of degrees for the vertices, Is a degree matrix of the superside, Is a super-edge weight matrix, T is a transposition, Is the drug characteristic of the t+1 layer, Is a characteristic of the drug of the t layer, Is a projection matrix; Fusing the drug features in the GCN features and the drug features in the hypergraph convolution network features to obtain a drug set And introducing a learnable drug attention factor Normalized calculation is carried out through a Softmax function, and the medicine priori expectation is obtained , , As a learnable medication attention factor, In order to determine the number of drug features, Is an exponential function; fusing the microbial characteristics in the GCN characteristics with the microbial characteristics in the hypergraph convolution network characteristics to obtain a microbial collection Introduction of a learnable microbial attention factor Obtaining the priori expectations of the microorganisms , , In order to be able to determine the number of characteristics of the microorganism, Is a learnable microbial attention factor; fusing disease features in GCN features and disease features in hypergraph convolution network features to obtain a disease set Introduction of a learnable disease attention factor Obtaining disease priori expectations , , As a function of the number of disease features, Is a learnable disease attention factor.
3. The high-order interaction prediction method with hybrid map deep learning of claim 2, further comprising: Is provided with 、、 Drug potential factor matrix, microorganism potential factor matrix and disease potential factor matrix respectively, are used Representing a set of potential factor matrices; Introducing deep learning mapping F, and calculating multi-element Gaussian distribution of a potential factor matrix G of the drug based on drug priori expectation, microorganism priori expectation and disease priori expectation , In the form of a diagonal matrix, Priori expecting for drugs Is arranged in the row i of the (a), For row i of the drug potential factor matrix G, Is in the form of normal distribution, Is a joint probability model parameter; Multi-element Gaussian distribution for calculating microorganism potential factor matrix H , Priori expectations for microorganisms Is arranged in the row j of the (c), The j-th row of the microorganism potential factor matrix H; Multi-element Gaussian distribution for calculating disease potential factor matrix W , Priori expectations for disease Is arranged in the row k of the (c), The k row of the disease potential factor matrix W; Based on the associated tensor Calculating an associated tensor Joint likelihood probability of (2) , , C is a parameter of importance level, Is the (i, r) th element of the drug potential factor matrix G, Is the (j, r) th element of the microorganism latent factor matrix H, The (k, r) th element of the disease latent factor matrix W, For correlating tensors (I, j, k) th element; Will correlate tensors Combining the joint likelihood probability and the multi-element Gaussian distribution to obtain a formula of a joint probability model: , For the a priori set of expectations, Is a parameter Associated tensor Is a function of the joint likelihood of (a) and (b), For correlating tensors Is a function of the likelihood of a (c) in the set, Is a parameter Likelihood functions of (2); By optimizing log-likelihood, the joint probability model parameters Priori expectation set And performing self-adaptive evaluation, wherein the formula is as follows: , for the optimal super-parameters of the model, For the optimal projection mapping set of the model, For correlating tensors Is a function of the likelihood of a (c) in the set, Is a derivative of all potential factor matrices; Inference to obtain posterior probability distribution of a set of potential factor matrices , For joint likelihood functions based on optimal super-parameters and projected parameters and variables, Is a likelihood function based on optimal super-parameters and projection.
4. A high-order interactive prediction method with hybrid map deep learning according to claim 3, characterized in that E-step is performed to infer the posterior expectation and covariance matrix of the latent factor matrix in combination with the associated tensor, in case of fixed prior expectation, in particular: calculating the ith row of the potential factor matrix G of the medicine according to variation deduction and average field approximation Posterior expectation of (2) Which satisfies a multivariate gaussian distribution Sum covariance matrix , Is the Khatri-Rao product, Is that Is used as a means for controlling the speed of the vehicle, Is that Is used as a means for controlling the speed of the vehicle, In the form of a Hadamard product, For posterior expectations of the disease latent factor matrix W, For posterior expectations of the microbial latent factor matrix H, Is that Is multiplied by the transpose of (2) , Is that Is multiplied by the transpose of (2) , Is that Is arranged in the row i of the (a), Is tensor Is used for the mode-1 matrixing of the matrix, , Is the ijk local variation parameter; Calculation of the j-th row of the microorganism latent factor matrix H Posterior expectation of (2) Which satisfies a multivariate gaussian distribution Sum covariance matrix , Is that Is used as a means for controlling the speed of the vehicle, Is that Is multiplied by the transpose of (2) , Representation of Is arranged in the row j of the (c), Representing tensors Is used for the mode-2 matrixing of the matrix, Posterior expectation for drug potential factor matrix G; Calculating the kth line of the disease latent factor matrix W Posterior expectation of (2) Which satisfies a multivariate gaussian distribution Sum covariance matrix , Representation of Is arranged in the row j of the (c), Representing tensors Mode-3 matrixing of (2); Ijk-th local variation parameter Satisfy the following requirements , For the generalized inner product symbol, In order to make the marking desirable, Is the product of potential factors 。
5. The high-order interactive prediction method with mixed graph deep learning according to claim 4, wherein the M-step is executed, under the condition of fixing posterior expectation and covariance matrix, the Adam optimizer is utilized to update joint probability model parameters and deep learning mapping, and the E-step and M-step are alternately executed until reaching optimal balance, so as to obtain optimal posterior distribution of all potential factor matrices, and the final posterior expectation value of each potential factor matrix is extracted as final joint probability model output, specifically: Back propagation update of joint probability model parameters using Adam optimizer, where the fixed variance approximates the posterior And based on joint probability model parameters Priori expectation set Maximizing ELBO, the formula is: , , , , wherein, In order to approximate the lower bound of evidence, For the posterior with respect to posterior q, For the matrix of the r-th potential factor, In order to perform the track-finding operation, Is G, H, W and Is a function of the second order of (2), As a quadratic function To the power of the exponent of (a), As a parameter of the local variation of the signal, Is a constant; Fixed a priori expectation set Will be With respect to Deriving, making derivative equal to 0 to obtain And maximize Obtaining a calculated priori expected set Is a function of the objective function of: , , , Is a matrix The elements on the diagonal line are represented by, Is a matrix The elements on the diagonal line are represented by, Is a matrix Elements on the diagonal; combining the posterior expectation of the first generation of the potential factor matrix with an objective function, and obtaining the prior expectation of the first generation (1 th generation) by using an Adam optimizer; alternately performing E-step and M-step until an optimal balance is reached, obtaining an optimal posterior distribution of all the latent factor matrices, and extracting the final posterior expectation of each latent factor matrix 、、 As the final joint probability model output.
6. The high-order interaction prediction method with hybrid map deep learning according to claim 5, wherein the calculation formula of the prediction association probability is: 。
7. A high-order interactive prediction device with hybrid map deep learning, comprising: A data extraction unit for extracting drug data, microorganism data, disease data and auxiliary information from a plurality of preset databases, and constructing a hybrid map structure and an associated tensor based on the extracted data; The prior expectation calculation unit is used for extracting features of the mixed graph structure by adopting a graph convolution network and a hypergraph convolution network, introducing corresponding attention factors and deep learning mapping, calculating prior expectation and multi-element Gaussian distribution, calculating joint likelihood probability based on a correlation tensor, and combining the joint likelihood probability and the multi-element Gaussian distribution to obtain a joint probability model; an E-step unit for executing E-step, under the condition of fixed prior expectation, deducing posterior expectation of potential factor matrix and covariance matrix by combining the associated tensor; The M-step unit is used for executing M-step, updating the joint probability model parameters and the deep learning mapping by using an Adam optimizer under the condition of fixing posterior expectation and covariance matrixes, alternately executing E-step and M-step until the optimal balance is achieved, obtaining the optimal posterior distribution of all potential factor matrixes, and extracting the final posterior expectation value of each potential factor matrix as the final joint probability model output; the prediction unit is used for calculating any one drug-microorganism-disease triplet according to the final posterior expected value to obtain prediction association probability, and performing descending order arrangement on the prediction association probability to generate a recommendation list; Extracting medicine data, microorganism data, disease data and auxiliary information from a plurality of preset databases, and constructing a mixed graph structure and an associated tensor based on the extracted data, wherein the specific steps are as follows: extracting medicine data, microorganism data and disease data and auxiliary information from a plurality of preset databases; Converting SMILES character string in medicine data to obtain medicine molecular diagram , wherein, Is an atomic attribute matrix, R is a characteristic dimension, n is an atomic number, Being an atomic attribute dimension, A is an adjacency matrix; respectively constructing microorganism weighted graphs based on microorganism data and disease data And disease weighted graph , wherein, Is the top point of the microorganism, and the top point of the microorganism is the top point of the microorganism, Is the side of the microorganism which is the side of the microorganism, Is the weight corresponding to the edge of the microorganism, As the apex of the disease is the point of the disease, Is the side of the disease which is to be treated, Weights corresponding to edges of the disease; Constructing DMD hypergraph according to medicine data, microorganism data and disease data and auxiliary information , Is a vertex set, which contains all drug data, microbiological data and disease data, Is a superedge set; Constructing a mixed graph structure according to the medicine molecular graph, the microorganism weighted graph, the disease weighted graph and the DMD hypergraph, and defining a hypergraph incidence matrix Wherein when the v-th vertex is in the e-th superside, the corresponding supergraph incidence matrix When the v-th vertex is not in the e-th superside, the corresponding supergraph incidence matrix , , ; Establishing drug-microorganism-disease associated tensor Wherein I is the number of medicines, J is the number of microorganisms, and K is the number of diseases.

Description

High-order interaction prediction method and device with hybrid image deep learning Technical Field The invention relates to the technical field of bioinformatics and computational biology, in particular to a high-order interaction prediction method and device with mixed graph deep learning. Background In biomedical research, the human microbiota (Microbiome) has been shown to be closely related to a variety of human diseases (e.g., obesity, diabetes, inflammatory bowel disease, etc.). Meanwhile, microorganisms play a key role in regulating the curative effect and toxicity of medicines, and the composition and functions of a microbial community can be changed in reverse by the intervention of medicines. Therefore, the potential association between Drug-microorganism-Disease (DMD) three is systematically explored, has important clinical significance for understanding Disease pathogenesis, promoting Drug development, realizing early diagnosis and accurate medical treatment of diseases. At present, methods for identifying the association among drugs, microorganisms and diseases are mainly divided into two major categories, namely biological experimental methods and computational prediction methods. Traditional biological assay methods, such as Wet assay (Wet-labexperiments), while considered to establish a relevant "gold standard", rely on expensive laboratory equipment, are limited by the difficulty of completely simulating the complex environment of the human body in an in vitro environment, and require extensive time and extensive case samples for clinical observation. In the face of massive candidate medicines and microorganism types, large-scale screening by only relying on traditional experiments is not practical, and the efficiency is low and the cost is high. To remedy the shortfall of experimental approaches, computational-based prediction methods are widely used to integrate multi-source information to predict potential associations. The existing calculation prediction method mainly comprises a tensor decomposition-based method and a deep learning-based method. In the tensor decomposition (TensorDecomposition) based approach, researchers have attempted to model the drug-microbe-disease ternary relationship as tensors, using tensor decomposition techniques such as CANDECOMP/PARAFAC decomposition, non-negative tensor decomposition, etc., to complement the missing relationship. However, most existing tensor decomposition methods integrate the auxiliary information using a linear model, and it is difficult to mine complex nonlinear relationships that are common in biological data. Furthermore, such methods typically involve a large number of super-parameters (e.g., tensor ranks), and model performance is extremely parameter sensitive, making it difficult to accommodate complex and varied data structures. While some studies have proposed logical tensor decomposition to introduce nonlinearities, there are still limitations in fusing heterogeneous networks. Among the deep learning (DEEPLEARNING) based methods, as graph neural networks develop, graph roll-up network (GCN) or hypergraph neural network (HGNN/HGCN) based methods are used to extract features of biological entities. Such methods learn node embedding by building a network or hypergraph. However, existing deep learning methods typically require negative sampling (NEGATIVESAMPLING) to construct the training set. Since the unknown correlation does not represent the absence (i.e., false negatives may exist), noise introduced by random negative sampling may reduce the accuracy of the prediction. Furthermore, supervised learning paradigms have difficulty covering a whole sample space, and single-view deep learning models are prone to information bias. In summary, in the prior art, when the high-order correlation prediction of drug-microorganism-disease is processed, it is difficult to simultaneously consider deep mining of nonlinear characteristics, avoiding deviation caused by negative sampling, and adaptive inference of model parameters. In view of this, the present application has been proposed. Disclosure of Invention The invention provides a high-order interaction prediction method, device, equipment and medium with mixed graph deep learning, which can at least partially improve the problems. In order to achieve the above purpose, the present invention adopts the following technical scheme: A high-order interaction prediction method with hybrid map deep learning, comprising: Extracting medicine data, microorganism data, disease data and auxiliary information from a plurality of preset databases, and constructing a mixed graph structure and an associated tensor based on the extracted data; carrying out feature extraction on the mixed graph structure by adopting a graph convolution network and a hypergraph convolution network, introducing corresponding attention factors and deep learning mapping, calculating priori expectation and multi-element Gauss