Search

CN-122024851-A - Biomolecule relation modeling method based on semantic consistency hypergraph contrast learning

CN122024851ACN 122024851 ACN122024851 ACN 122024851ACN-122024851-A

Abstract

The invention discloses a biomolecule relation modeling method based on semantic consistency hypergraph contrast learning, which aims to model complex relations among biomolecules by effectively combining local structure and global semantic information. According to the method, firstly, the local representation of the node is enriched by expanding a message aggregation mechanism guided by the subgraph, and the modeling precision of the biological molecular relationship is improved by combining global context information. And secondly, performing hypergraph reconstruction by adopting a double-layer consistency mechanism, and maintaining consistency of local structures and semantics through semantic and structure constraint at a node level and a superside level. In addition, soft clustering is performed by using a Gaussian Mixture Model (GMM) and a Bayesian Information Criterion (BIC) strategy, so that superside selection is further optimized, and structural consistency in the reconstruction process is ensured. In order to ensure consistency of local and global semantics, the method introduces multi-granularity comparison targets into a comparison learning frame, trains at node, superside and extension sub-graph levels respectively, feeds high-level semantic information back to low level through a cross-layer feedback mechanism, stabilizes a model and enhances semantic alignment between representations. Experimental results show that the proposed framework is excellent in node classification and clustering tasks on multiple reference data sets, and compared with the existing method, the method can better capture and align local and global semantic relations, and the hypergraph learning effect is remarkably improved.

Inventors

  • MA YUEFENG
  • WANG GUOLI
  • SHANG JUNLIANG
  • LIU YONGLI
  • QIU DEFU
  • DAI LIN
  • MA WENBO

Assignees

  • 曲阜师范大学

Dates

Publication Date
20260512
Application Date
20260201

Claims (8)

  1. 1. A biomolecule relation modeling method based on semantic consistency hypergraph contrast learning is characterized by comprising the following specific steps: Constructing a biomolecule hypergraph and initializing the characteristics; acquiring nodes, superedges and embedded representation of an original supergraph through a multi-layer message aggregation encoder guided by an extended sub-graph; determining an optimal cluster based on the Gaussian mixture model soft cluster and a Bayesian information criterion, and reconstructing to generate a semantically enhanced hypergraph view by combining semantic and structural double constraints; Comparing and learning the original view with the reconstructed view on three granularities of nodes, supersides and expanded subgraphs, and introducing a cross-layer semantic feedback mechanism to realize multi-level semantic alignment; And (3) combining the optimized multilevel contrast loss and semantic feedback loss, training the encoder, and outputting the final embedded representation of the biomolecules.
  2. 2. The method of claim 1, wherein the biomolecular hypergraph is constructed by taking each biomolecular entity as a hypergraph node and each known biological functional unit, interaction complex or biological pathway as a hyperedge comprising all molecular nodes involved in the function or interaction.
  3. 3. The method of claim 1, wherein the extended subgraph is defined as a substructure containing a target node, its first-order neighbor nodes and all superedges associated with the nodes, and wherein the characteristics of the extended subgraph are obtained by aggregating the average characteristics of the neighbor nodes in the extended subgraph with the average characteristics of the associated superedges and performing nonlinear transformation.
  4. 4. The method according to claim 1, characterized in that: soft clustering is carried out on node embedding by using a Gaussian mixture model, and the optimal cluster number is selected from the candidate cluster number set based on Bayes information criterion Obtaining the membership probability of the node to the cluster; Screening nodes according to the membership probability and a preset semantic threshold value to form initial semantic clusters, and further screening nodes from all the initial semantic clusters based on the structural similarity indexes among the nodes and the preset structural threshold value to generate candidate reconstruction superedges; And calculating the local coverage rate of each candidate reconstruction superside, and generating a reconstruction supergraph with the number of the sides being comparable with that of the original supergraph through a superside number maintaining strategy.
  5. 5. The method of claim 4, wherein the number of superedges is maintained by retaining a front with highest local coverage if the number of candidate reconstructed superedges is greater than the original number of superedges And if not, selecting the superside with the highest local coverage rate from the original supersides to supplement, so that the total number of the reconstructed supersides is equal to the number of the original supersides 。
  6. 6. The method of claim 1, wherein multi-level contrast learning comprises: Calculating node level contrast loss between the embedding of the same node in the original view and the reconstructed view; Matching the supersides with the closest semantics in the reconstructed view for each superside in the original view, and calculating the superside level contrast loss; And calculating sub-image level contrast loss between embedded distributions of all the extended sub-images in the original view and the reconstructed view.
  7. 7. The method according to claim 1, wherein the cross-layer semantic feedback mechanism is specifically: calculating the semantic center of each cluster according to the obtained optimal soft clustering result; according to the membership probability of the nodes, weighting and combining semantic centers of all clusters to obtain semantic consistency reconstruction representation of the nodes; By minimizing the distance between the node's original embedding and the semantic consistency reconstruction representation, a cross-layer semantic alignment penalty is constructed.
  8. 8. The method of claim 1, wherein the total loss function is: wherein The node level, the superside level, the sub-level contrast loss and the cross-layer semantic alignment loss are respectively, Is an adjustable balance coefficient.

Description

Biomolecule relation modeling method based on semantic consistency hypergraph contrast learning Technical Field The invention relates to a hypergraph neural network and application thereof, in particular to a biomolecule relation modeling method based on semantic consistency hypergraph contrast learning, aiming at accurately modeling complex relations among biomolecules by combining local and global semantic information. Background With the rapid development of bioinformatics and chemoinformatics, how to effectively model interactions among biomolecules and their structural and functional relationships is an important topic of biological research. The existing molecular relation modeling method mainly depends on the traditional graph model or the deep learning method, but the methods have certain limitations in modeling high-order structural information and capturing global semantic consistency. Hypergraph is taken as a mathematical model capable of naturally representing the relation among a group of entities (superedges), and provides an ideal framework for modeling biomolecular high-order interaction. In recent years, hypergraph neural networks (HGNN) and hypergraph contrast learning techniques have been developed that aim to learn a low-dimensional embedded representation of molecular nodes. However, the existing method still faces significant challenges of 1) local and global semantic splitting, most methods pay more attention to local neighborhood structures during message transmission, lack of effective alignment with global semantic contexts, so that learned node representations may have semantic drift, 2) rigidness of clustering reconstruction, namely that the existing clustering-based data enhancement or reconstruction method adopts a hard partitioning strategy, and cannot capture ambiguity and uncertainty of biomolecular function attribution, and 3) singleness of comparison granularity, namely that the existing hypergraph comparison learning method focuses on node-level alignment, cannot apply consistency constraint on multiple semantic granularities such as superside and local substructures at the same time, and limits discrimination of representing learning. Therefore, a new approach is needed to solve the consistency problem between local structures and global semantics and to efficiently handle complex molecular relationships. Disclosure of Invention In view of the above, in order to overcome the defects of the prior art, the invention provides a biomolecule relation modeling method based on semantic consistency hypergraph contrast learning. The method aims at simultaneously keeping local structural similarity and global functional semantic consistency by learning low-dimensional embedded representation of the biomolecular nodes in the hypergraph, thereby remarkably improving the performance of downstream tasks (such as molecular function prediction, interaction inference and cluster analysis). In order to achieve the above purpose, the invention adopts the following technical scheme: a biomolecule relation modeling method based on semantic consistency hypergraph contrast learning is characterized by comprising the following steps: 1. hypergraph construction and initialization by treating each molecular entity in the collection of biomolecules to be analyzed as a hypergraph node, treating each known functional unit, interaction complex or biological pathway as a hyperedge containing all molecular nodes involved in the function or interaction, thereby constructing an initial hypergraph WhereinFor a set of nodes,The node representation is initialized for a set of hyperedges and using molecular initial features or structural features. 2. Extension subgraph-guided multi-layer message aggregation, that is, for each target node, an extension subgraph is constructed, and the subgraph comprises the node, first-order neighbor nodes and all superedges associated with the nodes. Iterative message transfer between node-superside-expansion subgraphs is carried out through a multi-layer encoder, and local structure information and upper and lower Wen Yuyi are aggregated to obtain an original supergraphNode embedding of (c)Embedding over-edgeExtended subgraph embedding。 3. Semantic consistency hypergraph reconstruction based on soft clustering, namely embedding the nodes by utilizing Gaussian mixture modelSoft clustering is carried out, and the Bayes information criterion is adopted to dynamically determine the optimal cluster numberObtaining a membership probability matrix of each node belonging to each cluster. Based on the membership probability, combining a preset semantic threshold valueAnd structural similarity thresholdAnd carrying out soft distribution on the nodes to form a plurality of candidate node clusters. Screening or supplementing from candidate node clusters through a superside quantity maintaining strategy to generate a reconstructed supergraph which is comparable with the original supergr