CN-117198382-B - Chemical modification siRNA activity prediction method, device and equipment
Abstract
The invention discloses a chemical modification siRNA activity prediction method, device and equipment, which comprise the steps of obtaining an original base sequence and chemical modification information of target siRNA, searching physicochemical properties of the target siRNA, carrying out feature coding on the original base sequence, the chemical modification information and the physicochemical properties, and generating a silencing efficiency prediction result of the target siRNA according to coded features by adopting a pre-constructed prediction model, wherein the pre-constructed prediction model comprises a feature fusion sub-model and a classification sub-model, the feature fusion sub-model is used for carrying out feature fusion on the coded features based on a cross-attention mechanism, and the classification sub-model is used for generating the silencing efficiency prediction result according to the fused features. The method is based on a multidimensional and multi-view learning strategy and an attention mechanism fusion model, the characterization capability of the algorithm on chemically modified siRNA data is remarkably improved, and the prediction accuracy of the algorithm on the chemically modified siRNA drug activity is improved by combining the nonlinear data fitting capability of a deep learning framework.
Inventors
- LUO DELUN
- ZHANG YANG
- LIU TIANYUAN
Assignees
- 成都景润泽基因科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20230725
Claims (10)
- 1. A method for predicting the activity of chemically modified siRNA comprising: acquiring the original base sequence and chemical modification information of the target siRNA, and searching the physicochemical properties of the target siRNA based on the original base sequence; performing feature coding on the original base sequence, the chemical modification information and the physicochemical property, and generating a silencing efficiency prediction result of the target siRNA according to the coded features by adopting a pre-constructed prediction model; The pre-constructed prediction model comprises a feature fusion sub-model and a classification sub-model, wherein the feature fusion sub-model is used for carrying out feature fusion on the coded features based on a cross-attention mechanism, and the classification sub-model is used for generating a silencing efficiency prediction result according to the fused features.
- 2. The method of claim 1, wherein the chemical modification information comprises sense strand chemical modification information and antisense strand chemical modification information.
- 3. The method of claim 1, wherein the physicochemical properties comprise molecular weight, XLogP, number of hydrogen bond donors, number of hydrogen bond acceptors, exact mass, monoisotopic mass, topological surface area, number of heavy atoms, complexity, and defined bond stereocenter count.
- 4. The method for predicting the activity of chemically modified siRNA according to claim 1 wherein the classification sub-model is a two-layer convolutional neural network comprising a two-layer convolutional sub-network, two fully connected layers and an output layer.
- 5. The method of claim 4, wherein each layer of the convolution sub-network comprises a convolution layer, a ReLU activation function, and a max pooling layer.
- 6. The method of claim 5, wherein the convolution layer uses a convolution kernel of 3x3, a step size of 1, and a 1-pixel fill.
- 7. The method for predicting the activity of chemically modified siRNA according to any one of claims 1 to 6, wherein the prediction model is constructed in advance by the following method comprising: establishing an initial model of the prediction model, wherein the initial model comprises a feature fusion sub-model and a classification sub-model; obtaining the original base sequences, chemical modification information and silencing efficiency of a plurality of chemically modified siRNA medicaments from siRNAmod databases; constructing a data sample based on chemical modification information, physicochemical properties and silencing efficiency of a plurality of chemically modified siRNA medicaments, and generating a sample data set; Training an initial model based on the sample data set, and obtaining the prediction model when the model meets the condition convergence.
- 8. A chemically modified siRNA activity prediction device, comprising: an acquisition unit for acquiring an original base sequence and chemical modification information of the target siRNA; a retrieval unit for retrieving physicochemical properties of the target siRNA based on the original base sequence; The coding unit is used for carrying out characteristic coding on the original base sequence, the chemical modification information and the physicochemical property; The prediction unit is used for generating a silencing efficiency prediction result of the target siRNA according to the coded features by adopting a pre-constructed prediction model, wherein the pre-constructed prediction model comprises a feature fusion sub-model and a classification sub-model, the feature fusion sub-model is used for carrying out feature fusion on the coded features based on a cross-attention mechanism, and the classification sub-model is used for generating the silencing efficiency prediction result according to the fused features.
- 9. The chemically modified siRNA activity prediction device according to claim 8, further comprising a storage unit for storing physicochemical properties of a base sequence; the retrieval unit is configured to retrieve physicochemical properties of the target siRNA from the storage unit based on the original base sequence.
- 10. An electronic device comprising a processor, a network interface, and a memory, wherein the processor, the network interface, and the memory are interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the chemically modified siRNA activity prediction method of any of claims 1 to 8.
Description
Chemical modification siRNA activity prediction method, device and equipment Technical Field The invention relates to the technical field of siRNA activity prediction, in particular to a chemical modification siRNA activity prediction method, a chemical modification siRNA activity prediction device and chemical modification siRNA activity prediction equipment. Background The small interfering RNA (SMALL INTERFERING RNA, SIRNA) is a double-stranded non-coding RNA that is 21-23 bases in length. In an organism, the siRNA-mediated RNA interference pathway (RNA INTERFERENCE PATHWAY, RNAI PATHWAY) can specifically degrade mRNA and silence genes. Thus, siRNA is designed as a nucleotide drug, specifically targeting pathogenic genes, effecting treatment of disease at the mRNA level. The traditional medicines mainly comprise two major classes, namely small molecule medicines and antibody medicines, and the therapeutic effect is exerted through targeting proteins. However, the number of target proteins obtained in the prior art is less than 700, and the traditional drug development is greatly limited. Compared with the traditional medicines, the siRNA medicine has the advantages of wide targets, strong specificity, short research and development period and the like. In recent years, 5 siRNA drugs have been marketed, and several other therapies are in clinical trials. Although siRNA drugs have important social and strategic significance, problems of drug delivery, off-target effect, immunotoxicity, etc., seriously affect the drug activity, and limit drug development and clinical application. RNA modification (RNAmodification) is chemical modification on RNA, and more evidence shows that adding chemical modification to the phosphoric acid, ribose or base of siRNA can improve the stability, specificity and safety of siRNA drugs, thereby improving the activity of siRNA drugs. Therefore, reasonable chemical modification of siRNA is the key for guaranteeing the drug property of siRNA. The early chemical modification templates designed by relying on priori knowledge of drug developers have high failure rate in practical application and huge consumption in time and economy. The high-performance model is designed by using machine learning and other methods to assist in predicting the activity of chemically modified siRNA drugs, so that the method becomes a practical requirement for drug development. Many different methods have been developed to predict or design the chemical modification of siRNA pharmacological activity, and the existing methods are mainly divided into two categories (as shown in FIG. 1) based on the rule of prediction of chemical modification of siRNA pharmacological activity and (2) based on machine learning according to the specific model and strategy employed. Rule-based siRNA modification designs mainly include standard template chemistry (STANDARD TEMPLATE CHEMISTRY, STC), enhanced template chemistry (Enhanced stabilization chemistry, ESC), etc., which are designed by researchers based on modification rules observed from experimental data and constantly updated for optimization after clinical validation (Friedrich, m.et al., bioDrugs 2022,36,549-571). The method following the prior rule has higher targeting and specificity, so that the synthesis of siRNA molecules is simpler and more efficient. However, the rule-based design scheme needs complex calculation and analysis, has high technical level requirements on researchers, and meanwhile, the observed chemical modification rule has cell or tissue bias, so that the failure rate is high in practical application, and a large amount of experiments and analysis are needed. There are only two current methods for predicting siRNA chemical modification based on machine learning, SMEpred (Dar, s.a et al, RNABiol2016,13, 1144-1151) based on support vector machine (Support vectormachine, SVM) and Dong et al (Dong et al, molecules 2022,27) based on partial least squares (PARTIAL LEAST square, PLS) regression, respectively. These methods learn the relationship between known siRNA molecule data and drug activity (i.e., silencing efficiency) and then use to predict silencing efficiency of new siRNA molecules. The machine learning method is high in flexibility, free of limitation of rules, capable of automatically designing, capable of improving research efficiency, high in robustness of a model, and capable of enabling a data set to contain experimental data of different tissues and cells. However, SMEpred and Dong et al both use a feature extraction approach to data characterize the antisense strand of siRNA and serve as the initial input layer for the model. SMEpred and Dong et al consider only a limited set of features during the feature extraction process, ignoring the features affecting the activity of the drug, such as RNA sequence information, so that the information extracted from the corresponding model lacks diversity, the model efficacy is limited by the ch