CN-115862736-B - Mitochondrial genetic variation sequencing method and system based on neural network

CN115862736BCN 115862736 BCN115862736 BCN 115862736BCN-115862736-B

Abstract

The invention discloses a mitochondrial genetic variation sequencing method and a mitochondrial genetic variation sequencing system based on a neural network, wherein the method obtains a variation result after gene sequencing; the method comprises the steps of obtaining feature vectors corresponding to each mitochondrial genetic variation in a sample to be tested according to variation results and an HPO list, inputting the feature vectors corresponding to each mitochondrial genetic variation in the sample to be tested into a trained neural network model, obtaining prediction results corresponding to each mitochondrial genetic variation in the sample to be tested, and sequencing the prediction results. The invention can solve the problems that the existing artificial mutation analysis flow is low in efficiency and the existing sorting algorithm is not completely suitable for mitochondrial genetic mutation analysis, can improve the efficiency of mitochondrial genetic mutation analysis, and can improve the association of sorting results with clinical symptoms and mutation pathogenicity.

Inventors

PENG LIMIN
LEI PENG
ZHANG SHAOWEI
JIANG YANHUANG
YU SHUOJUN

Assignees

人和未来生物科技（长沙）有限公司

Dates

Publication Date: 20260505
Application Date: 20221121

Claims (9)

1. A neural network-based mitochondrial genetic variation ordering method, which is characterized by comprising the following steps: obtaining a mutation result after gene sequencing; Obtaining feature vectors corresponding to each mitochondrial genetic variation in a sample to be tested according to the variation result and the HPO list, wherein the feature vectors corresponding to each mitochondrial genetic variation comprise a variation type feature vector used for representing converting a variation type into a binary vector, an amino acid change feature vector used for representing converting a standard variation representation form into a vector, a crowd frequency feature vector used for representing the maximum frequency of each variation in various crowd databases, a variation heterogeneity feature vector used for representing a variation mitochondrial DNA ratio, a maternal genetic background feature vector used for representing whether a mother carries the same variation, a database recording feature vector used for representing whether the variation is recorded by various mitochondrial databases, a prediction scoring feature vector used for scoring the harmfulness prediction scores of different variation types, and a phenotype association feature vector used for representing the association phenotype of the variation and the matching degree of the HPO list input by a user; Inputting the feature vector corresponding to each mitochondrial genetic variation in the sample to be tested into a trained neural network model, obtaining a prediction result corresponding to each mitochondrial genetic variation in the sample to be tested, and sequencing the prediction results.
2. The neural network-based mitochondrial genetic variation ordering method of claim 1, wherein the converting the variation type into a binary vector comprises: Obtaining the species number of the variant type of the mitochondrial genetic variation; and converting the mutation type into binary vectors according to the type of the mutation type, wherein the dimension value of the binary vectors is equal to the type number of the mutation type.
3. The neural network-based mitochondrial genetic variation ordering method of claim 1, wherein scoring the deleterious prediction scores of different variation types comprises: obtaining a deleterious prediction score for each variant type in the mitochondrial genetic variation; Normalizing the harm prediction scores of each mutation type: Wherein, the A normalized score representing the harm prediction score, Representing the said harm prediction score, Representing the minimum value of the hazard prediction score, Representing a maximum value of the hazard prediction score; the predicted score for the toxicity of each variant type in the mitochondrial genetic variation was scored as follows: Wherein, the The score is indicated as a function of the score, A set of normalized scores representing the harm prediction scores of different variant types; If the score of any mutation type in the set is missing, the score of the mutation type is marked as 0.
4. The neural network-based mitochondrial genetic variation ordering method of claim 1, wherein calculating the variation heterogeneity feature vector comprises: Removing the repeated sequence to obtain variant sequencing depth; calculating the variant heterogeneity feature vector according to the sequencing depth of the variant: Wherein, the Representing the sequencing depth of the variation, Represents the total sequencing depth, H represents the variant heterogeneity feature vector.
5. The neural network-based mitochondrial genetic variation sequencing method according to claim 1, wherein before the feature vector corresponding to each mitochondrial genetic variation in the sample to be tested is input into the trained neural network model, the neural network-based mitochondrial genetic variation sequencing method further comprises: obtaining training samples of a plurality of known mutation results; Acquiring feature vectors of mitochondrial genetic variation corresponding to each variation in a training sample of each known variation result according to the variation result and the HPO list; Calibrating a feature vector corresponding to each mitochondrial genetic variation in the training sample to obtain a calibration result; Constructing a training set and a verification set according to the feature vectors and the calibration results corresponding to each mitochondrial genetic variation in the training sample; And training a preset neural network model through the training set and the verification set to obtain a trained neural network model.
6. The neural network-based mitochondrial genetic variation ordering method of claim 1, wherein the neural network model comprises an input layer, an intermediate layer and an output layer, wherein the number of nodes of the input layer is the same as the total dimension of the feature vector of the sample, the number of nodes of the intermediate layer is greater than the number of nodes of the input layer, and the output layer comprises one node.
7. A neural network-based mitochondrial genetic variation ordering system, the neural network-based mitochondrial genetic variation ordering system comprising: the mutation result acquisition unit is used for acquiring a mutation result after gene sequencing; A feature vector obtaining unit, configured to obtain, according to the mutation result and the HPO list, feature vectors of mitochondrial genetic mutation corresponding to each mutation in a sample to be tested, where the feature vectors corresponding to each mitochondrial genetic mutation include a mutation type feature vector for converting a mutation type into a binary vector, an amino acid change feature vector for converting a standard mutation expression form into a vector, a crowd frequency feature vector for representing a maximum frequency of occurrence of each mutation in a plurality of crowd databases, a mutation heterogeneity feature vector for representing a mutation mitochondrial DNA ratio, a maternal genetic background feature vector for representing whether a mother carries the same mutation, a database registration feature vector for representing whether a mutation is registered by various mitochondrial databases, a prediction score feature vector for scoring a harmful prediction score of different mutation types, a phenotype association feature vector for representing a mutation, and a phenotype association feature of a HPO list input by a user; The prediction result obtaining unit is used for inputting the feature vector corresponding to each mitochondrial genetic variation in the sample to be tested into the trained neural network model, obtaining the prediction result corresponding to each mitochondrial genetic variation in the sample to be tested, and sequencing the prediction results.
8. A neural network-based mitochondrial genetic variation ordering apparatus comprising at least one control processor and a memory communicatively coupled to the at least one control processor, the memory storing instructions executable by the at least one control processor to enable the at least one control processor to perform the neural network-based mitochondrial genetic variation ordering method of any one of claims 1-6.
9. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the neural network-based mitochondrial genetic variation ordering method of any one of claims 1 to 6.

Description

Mitochondrial genetic variation sequencing method and system based on neural network Technical Field The invention relates to the technical field of mitochondrial gene detection, in particular to a mitochondrial genetic variation sequencing method and system based on a neural network. Background Mitochondria produce 90% of the energy required by the human body and are important energy metabolism organelles for every cell of humans. Mitochondrial DNA as the extracellular genetic material, if mutated, can lead to a variety of serious metabolic and neurological genetic diseases, e.g., leigh syndrome, mitochondrial myopathy, etc. It is counted that one person in every 5,000 people suffers from hereditary mitochondrial diseases, and analysis of mitochondrial DNA variation has wide medical and social values. Unlike the nuclear autosomes, mitochondrial DNA has some unique genetic characteristics and it is necessary to specifically optimize the variant analysis procedure. For example, mitochondrial DNA, while only about 16kb in length, can be present in more than 10 to 1000 copies in a cell. The heterogeneity of mitochondrial variation can be pathogenic, even though a few copies of variation can be pathogenic, e.g., mitochondrial DNA inherited through maternal lines, mitochondrial DNA has no introns, genetic organization of exons, etc. These differences result in a general mutation analysis standard that is not fully applicable to mitochondria, e.g., clinGen (american clinical genome resources center) expands ACMG (american society of medical science) genetic mutation classification guidelines to class guide ClinGen Mito DISEASE ACMG Specifications for mitochondria, including but not limited to, removal of PM3 indicators associated with dominant inheritance or stealth inheritance, removal of PM1, PP2 indicators associated with regions of genetic mutation hot spots, modification of PS2 indicators associated with new mutations to accommodate mitochondrial maternal genetic characteristics. These theoretical and practical changes make the analytical procedure for autosomal variation not entirely applicable to mitochondrial variation. At a practical level, it is necessary to automatically rank (or score) the variant results. High throughput sequencing techniques can obtain complete mitochondrial variation at low cost, but from an efficiency perspective, it is not possible to examine hundreds or thousands of variation results on a case-by-case basis and to classify hundreds or thousands of variation results for pathogenicity. Existing genetic variation ordering methods, such as Exomiser rely on weighted average of several pathogenicity assessment algorithms, while weight settings are highly subjective, increasingly difficult to balance as input parameters increase, and Exomiser is a parameter chosen for autosomal variation. Furthermore, there is a lack of integration between pathogenicity scoring methods for mitochondrial variation, e.g., APOGEE can predict the harmfulness of missense mutations, mitoTip can predict the types of harmfulness, but their scores are not comparable and cannot be put directly together in order. In addition, genetic variation sequencing requires consideration of phenotype matching in addition to calculating variation pathogenicity, so that the analysis results are closer to each specific case. Disclosure of Invention The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a mitochondrial genetic variation sequencing method and a mitochondrial genetic variation sequencing system based on a neural network, which can solve the problems that the existing artificial variation analysis flow is low in efficiency and that the existing sequencing algorithm is not completely suitable for mitochondrial genetic variation analysis, can improve the efficiency of mitochondrial genetic variation analysis, and can improve the correlation between the sequencing result and clinical symptoms and variation pathogenicity. In a first aspect, an embodiment of the present invention provides a neural network-based mitochondrial genetic variation sequencing method, where the neural network-based mitochondrial genetic variation sequencing method includes: obtaining a mutation result after gene sequencing; according to the mutation result and the HPO list, obtaining a feature vector corresponding to each mitochondrial genetic mutation in the sample to be tested; Inputting the feature vector corresponding to each mitochondrial genetic variation in the sample to be tested into a trained neural network model, obtaining a prediction result corresponding to each mitochondrial genetic variation in the sample to be tested, and sequencing the prediction results. Compared with the prior art, the first aspect of the invention has the following beneficial effects: The method comprises the steps of obtaining a mutation result after gene sequencing, obtainin