CN-121983171-A - Method, device, processor and readable storage medium for realizing GPCR receptor class target drug discovery based on model consensus mechanism
Abstract
The invention relates to a method for realizing GPCR receptor target drug discovery based on a model consensus mechanism, which comprises the following steps of constructing composite training data of a general data set and a special data set, constructing a cascading subtask model, sequentially carrying out interaction prediction, binding strength prediction and functional activity prediction, capturing interaction characteristics through a double-view coding and interaction module, outputting a standardized prediction score, and generating unified confidence and consistency metrics for the interaction prediction, affinity prediction and functional activity prediction tasks to obtain a final screening result. The method, the device, the processor and the computer readable storage medium for realizing GPCR receptor class target drug discovery based on the model consensus mechanism realize layered inference on the effect of protein-micromolecules by cascading multitask modeling and multiscale feature fusion, reduce single model deviation and improve prediction precision, automatically screen out unstable prediction by using the model consensus mechanism, and improve the robustness and generalization capability of results.
Inventors
- YOU XIAOYU
- LI LIANG
- JIANG QINGCHAO
- DU LEI
- WANG RONGCHAO
- XIE JINGLI
- ZHAO SHUXIN
Assignees
- 华东理工大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260122
Claims (13)
- 1. A method for realizing target drug discovery of GPCR receptors based on a model consensus mechanism, which is characterized by comprising the following steps: (1) Constructing composite training data of a general data set comprising interaction data across a family of proteins and a specific data set comprising interaction data for a specific GPCR target; (2) Establishing a cascading subtask model, dividing the cascading subtask model into an interaction layer, an affinity layer and a functional active layer, and sequentially carrying out interaction prediction, bonding strength prediction and functional activity prediction, wherein hidden features of a former layer of subtask are input as a latter layer of model to form a progressive structure conforming to a biological mechanism; (3) Constructing a multi-scale characterization model based on the input of proteins and small molecules, capturing interaction characteristics through a double-view coding and interaction module, and outputting a standardized prediction score; (4) And generating unified confidence and consistency metrics for interaction prediction, affinity prediction and functional activity prediction tasks based on the model consensus result driven by the multi-model cascade output construction information, and obtaining a final screening result.
- 2. The method for realizing target drug discovery of GPCR receptors based on model consensus mechanism according to claim 1, wherein the step (1) specifically comprises the following steps: (1.1) obtaining a generic dataset from BindingDB and PDBbindv2020 databases, processing from BindingDB, PDBbind and Glass datasets, obtaining GPCR-bind and GPCR-ECIC datasets as dedicated datasets; (1.2) selecting different data sets for training according to different task models, reserving IC50 and EC50 data, removing repeated pairs and unresolved compounds, and deleting repeated thresholds and signed values.
- 3. The method for realizing target drug discovery of GPCR receptors based on model consensus mechanism according to claim 1, wherein the cascade subtask model in step (2) is divided into three subtasks according to interaction prediction, binding strength prediction and functional activity prediction, each layer of subtasks is trained independently, and the corresponding model is as follows: Wherein t is the task level, The prediction output of the model k to the protein target P and the small molecule C is represented, and the output of the model is unified as ; The three subtasks form a progressive structure, and the mapping relation satisfies the following conditions: Wherein, the 、 、 The prediction output of the kth model in the interaction prediction, binding strength prediction and functional activity prediction tasks are respectively represented, 、 And Respectively representing the number of model branches corresponding to the respective tasks.
- 4. The method for realizing target drug discovery of GPCR receptors based on model consensus mechanism according to claim 1, wherein the step (3) specifically comprises the following steps: (3.1) determining a multi-scale modeling target corresponding to each subtask, and constructing a plurality of independent AI branch models for each subtask, wherein the branch models adopt a Y-shaped architecture and comprise a small molecular encoder, a protein encoder and an interaction module; And (3.2) respectively obtaining multi-scale characteristics of proteins and small molecules through double-view coding, wherein the proteins adopt a sequence convolution view angle and a picture structure view angle, and the small molecules adopt an SMILES convolution view angle and a molecular picture view angle.
- 5. The method for implementing GPCR receptor class target drug discovery based on model consensus mechanism according to claim 4, wherein in the step (3.1), a plurality of independent AI branch models are constructed for each subtask, specifically: the Y-shaped architecture of the branch model is obtained according to the following formula: Where t represents the task hierarchy, m represents the model number under that hierarchy, Is a small molecule encoder for mapping sequence features or graph structure features of an input small molecule C into hidden vectors ; Is a protein encoder for encoding protein sequence or structural information to obtain hidden characterization 。
- 6. The method for realizing the target drug discovery of GPCR receptors based on model consensus mechanism according to claim 4, wherein the step (3.2) is divided into modules with different dimensions, specifically an interaction layer, an affinity layer and a functional active layer, wherein, The interaction layer is used for constructing a multi-scale interaction representation by the structural characteristics and the sequence characteristics of the protein and the small molecules and predicting the associativity of the protein and the small molecules; The affinity layer is used for carrying out regression prediction on the binding strength of the protein and the small molecules by combining the integral characteristic and the local interaction characteristic of the molecular level on the basis of the output of the interaction layer; and the functional active layer is used for multiplexing global characteristic representation obtained by the affinity layer and predicting the functional response of the compound to the target through independent prediction branches.
- 7. The method for realizing target drug discovery of GPCR receptor class based on model consensus mechanism as claimed in claim 6, wherein the interaction layer is divided into branches And branch Branches into Based on the graphic structure representation of protein and small molecule, the atomic or residue pair is modeled interactively, and the local interaction information is aggregated in a concentration weighted mode to obtain the interaction characteristics for representing the local physical contact and the fine matching relation, and the branches are formed Based on the sequence representation of proteins and small molecules, establishing an alignment relationship between the proteins and the small molecules at a sequence level through a cross attention mechanism to obtain global interaction characteristics for describing overall semantic matching and long-range dependence; the affinity layer is divided into branches And branch Branches into Based on sequence-level global representation of proteins and small molecules, the integral features are extracted through convolution and nonlinear mapping and fused, so that affinity features reflecting integral binding trend of the molecules are obtained, and branches are obtained Based on the graph structural features, obtaining affinity features reflecting local energy contributions through aggregation of local interaction information; The functional active layer is divided into branches Branches into Is a global encoder base of multiplexing affinity layers, is trained separately using a dedicated data set, and predicts inhibitory activity separately by two independent regression heads And agonistic activity Multi-branch learning at the task level is realized.
- 8. The method for realizing target drug discovery of GPCR receptors based on model consensus mechanism according to claim 1, wherein the step (4) specifically comprises the following steps: (4.1) adding a model consensus mechanism on the basis of a cascade model, and performing consensus calculation to obtain an intra-group average value of multi-model confidence coefficient; and (4.2) defining the divergence among the experience consistency entropy measurement models, and calculating to obtain an uncertainty measurement, namely, the divergence measurement among the models on the unified confidence coefficient space.
- 9. The method for achieving GPCR receptor class target drug discovery based on model consensus mechanism according to claim 1, wherein the step (4) uses three complementary screening strategies, namely hierarchical screening, two-stage screening and global screening, based on the consensus score and uncertainty metric, and obtains the candidate compound set through at least one screening strategy.
- 10. The method for realizing target drug discovery of GPCR receptors based on model consensus mechanism according to claim 1, wherein the hierarchical screening specifically comprises: Three layers of screening are sequentially carried out through interaction prediction, bonding strength prediction and functional activity prediction, and samples with high consensus scores and low uncertainty are reserved in each layer; The two-stage screening is specifically as follows: Combining interaction prediction and bonding strength prediction layers, screening out samples with high consensus scores and low uncertainty, and sorting and screening based on functional activity prediction scores; the global screening specifically comprises the following steps: And respectively screening high-confidence samples on the three subtasks independently, and taking intersections.
- 11. A device for realizing target drug discovery of GPCR receptors based on a model consensus mechanism, which is characterized by comprising: A processor configured to execute computer-executable instructions; a memory storing one or more computer-executable instructions which, when executed by the processor, perform the steps of the method of implementing GPCR receptor class target drug discovery based on a model consensus mechanism as claimed in any one of claims 1 to 10.
- 12. A processor for implementing GPCR receptor class target drug discovery based on a model consensus mechanism, wherein the processor is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the method for implementing GPCR receptor class target drug discovery based on the model consensus mechanism of any one of claims 1 to 10.
- 13. A computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method of any one of claims 1 to 10 for achieving GPCR receptor class target drug discovery based on a model consensus mechanism.
Description
Method, device, processor and readable storage medium for realizing GPCR receptor class target drug discovery based on model consensus mechanism Technical Field The invention relates to the technical field of neural networks, in particular to the field of virtual screening, and specifically relates to a method, a device, a processor and a computer readable storage medium for realizing GPCR (GPCR) receptor class target drug discovery based on a model consensus mechanism. Background In the field of drug discovery, rapid recognition of highly active molecules against targets and prediction of protein-ligand interactions have contributed to the core supportive effect of pushing efficient drug discovery. The rapid acquisition of the data related to the interaction intensity of the target spot and the drug is important for optimizing the molecular structure of the drug in time and shortening the research and development period. This is advantageous to improve drug development success rate, reduce risk of clinical failure, and control development cost investment. However, it is difficult to obtain comprehensive target-drug interaction information rapidly and with high throughput due to the complexity of biological systems, the hiding of the mechanism of action of the target, and the limitations of traditional experimental methods. Therefore, an efficient and accurate technical means is urgently needed to break through the bottleneck. Under the condition, the AI-based virtual screening technology becomes a solution with great potential, a prediction model is constructed by integrating structural biological information and drug molecular characteristics, and the rapid prediction capability of the AI-based virtual screening technology on target affinity and functional activity is utilized, so that the AI-based virtual screening technology is widely applied and accepted in drug discovery and target drug development. In the field of small molecule drug development, computational prediction methods can be generally classified into models based on physical mechanisms and models based on data driving. With the development of deep learning and structural biology, data-driven virtual screening methods are becoming an important direction for drug development. Traditional methods such as random forest, SVM and QSAR based on molecular fingerprint achieve a certain result on specific targets, but due to the fact that protein-small molecule interaction has nonlinear, multi-level and multi-scale characteristics, a single-task or single-scale model is difficult to completely describe a drug binding mechanism, and the method is insufficient in prediction of new targets without labels. Although the local modeling or nuclear method can handle the nonlinear problem, the generalization performance is unstable, and obvious limitations still exist in complex target point environments. By means of the hierarchical reasoning characteristic of the cascade subtask modeling which accords with a biological mechanism, biological processes such as 'whether combined-combined strength-functional effect' and the like can be modeled into a sequentially-propelled multitask structure, so that a real medicine action process can be better simulated. Meanwhile, the multi-scale artificial intelligent model relies on cross-scale characterization such as sequence level, graph structure level, three-dimensional geometric level and the like, global conformational information and local microscopic interaction can be captured at the same time, and the model has higher prediction precision and generalization capability under a complex target scene. By virtue of strong characteristic expression capability and trans-scale fusion capability, the multi-scale depth model shows higher accuracy and generalization performance in the complex target drug screening process. Although cascading and multi-scale modeling significantly improves performance, protein-small molecule interaction data involved in drug development has significant heterogeneity and distribution complexity, and significant differences exist in sequence, structure and interaction mechanism between different targets, which often affect the robustness and reliability of a single deep learning model on new targets. Although the existing methods such as Bayesian neural network, transfer learning and meta learning can improve partial generalization capability, a unified uncertainty measurement system is lacking among different model families and different task types, and the computation cost is increased by multi-model superposition. In contrast, by introducing an information theory driven model consensus mechanism and a multi-strategy screening framework, uncertainty sources can be comprehensively analyzed from multiple prediction perspectives, reliability of the model under the condition of a new target point is improved, redundant calculation is reduced, and therefore a more robust and efficient drug scre