CN-122025119-A - Neonatal genetic metabolic disease risk assessment method based on multi-modal collaborative learning

CN122025119ACN 122025119 ACN122025119 ACN 122025119ACN-122025119-A

Abstract

The invention discloses a neonatal genetic metabolic disease risk assessment method based on multi-modal collaborative learning, which comprises the following steps of collecting and preprocessing gene, metabolism and phenotype data, constructing a normalized multi-modal input feature, constructing a cross-modal interaction network, extracting fusion features from the preprocessed data to enhance the expression capacity of pathological information, constructing a metabolic pathway graph, modeling the structure and functional relation among metabolites by using a graph convolution network, aligning the multi-modal features with the metabolic pathway graph representation vectors, and realizing efficient multi-modal fusion reasoning based on an incremental attention mechanism guided by confidence coefficient, so as to complete disease data processing. According to the invention, through multi-modal data fusion and graph convolution network modeling, the pathological information expression capability is enhanced, high-efficiency fusion reasoning is realized, and the accuracy and efficiency of neonatal disease data processing are improved.

Inventors

LIN BO
Weng Chentian
YANG XIN
SHEN YAPING
Xu Xiaocha

Assignees

浙江大学
浙江大学滨江研究院

Dates

Publication Date: 20260512
Application Date: 20251229

Claims (8)

1. A neonatal genetic metabolic disease risk assessment method based on multi-modal collaborative learning is characterized by comprising the following steps: firstly, collecting and preprocessing gene, metabolism and phenotype data, and constructing normalized multi-mode input characteristics; Step two, constructing a cross-modal interaction network, and extracting fusion characteristics from the data preprocessed in the step one to enhance the expression capacity of pathological information; Step three, constructing a metabolic pathway diagram and modeling structural and functional association among metabolites by using a graph convolution network; and step four, disease risk level assessment based on multi-mode collaborative reasoning is carried out, and hierarchical intelligent auxiliary decision making is realized.
2. The method for evaluating risk of neonatal inherited metabolic disease based on multi-modal collaborative learning according to claim 1, wherein the specific steps of collecting and preprocessing genetic, metabolic and phenotypic data in the first step and constructing normalized multi-modal input characteristics are as follows, firstly, collecting neonatal samples of 1-28 days of birth, layering according to 1-7 days and 8-28 days, taking peripheral blood and urine as core detection materials, simultaneously obtaining medical texts of pregnancy test reports, delivery records and family genetic histories, and finally constructing a trans-modal characteristic input set through time alignment and unique identification of the samples: Wherein, the The vector of the gene is represented by a vector, The metabolic vector is represented by a vector of the metabolism, The expression of the phenotype vector is performed, Representing the medical text feature vector.
3. The method for risk assessment of neonatal genetic metabolic disease based on multimodal collaborative learning according to claim 2, wherein the specific way of preprocessing the genetic, metabolic and phenotypic data in the first step is as follows: for gene data, mapping a base variation mode into a dense vector representation by adopting an embedding function, and further capturing key genetic characteristics of point mutation and indels: First of all The original inputs of the individual gene loci are base variation information and mutation type, An embedding function optimized for the characteristics of the gene sequence; And (3) carrying out normalization treatment on metabolite concentration data in combination with sample age so as to ensure that the distribution of the metabolite concentration data is unified with the same numerical scale, thereby reducing offset caused by inter-individual difference: Represent the first The original concentration values of the individual metabolites, Represent the first The average of the individual metabolites in the training set samples, Represent the first Standard deviation of individual metabolites in the training set samples; The expressed metabolite characteristics are used for unifying a scale input model, so that numerical value difference interference modeling is avoided; The expression is described, a clinical semantic embedding function is designed, and the numerical representation of typical symptoms such as jaundice and vomiting are extracted in the context of context awareness: Represent the first A natural language description of the individual phenotypes of the model, Representing special embedded function based on clinical phenotype corpus training, carrying out semantic enhancement on characteristic symptoms of neonatal genetic metabolic diseases, and carrying out semantic enhancement on the characteristic symptoms of neonatal genetic metabolic diseases Performing context-aware encoding, generating dense semantic vectors, Represent the first Numerical vector representations of individual phenotypes, useful for fusion modeling with other modalities; And (3) eliminating redundant information from the medical text by adopting a space-time and term level three-dimensional mask matrix, and enhancing key information expression related to pathology by combining a time attenuation factor and term level weight to finally generate a low-dimensional dense vector: Wherein, the The standardized medical text is cleaned for the q-th bar, For the text type weight to be weighted, As a time-decay factor, As the term level weight of the term, For the semantic association of text with metabolic disease causing pathways, In (a) In the case of a bernoulli distribution variable, Is a row pruning function.
4. The neonatal genetic metabolic disease risk assessment method based on multi-modal collaborative learning according to any one of claims 1 to 3, wherein the specific steps of constructing a cross-modal interaction network in the second step, extracting fusion characteristics from the data preprocessed in the first step to enhance the expression capability of pathological information are as follows: first, based on the observed loci in the sample and their pathogenicity scores, the intensity of pathological association of the sample relative to the set of key pathogenic genes is calculated: Wherein, the Is a key pathogenic gene set of the genetic metabolic disease of the newborn, 1 Or 0 for the indicator function; Scoring for gene locus pathogenicity; reflecting the pathological association strength of the sample and the genetic metabolic disease; then, a dynamic rank low rank decomposition is carried out, the formula is as follows: Wherein, the As the decomposed low-rank matrix, the dimension is along with In a variation of the method, the device, For the F-norm, the decomposition error is measured, As the weight of the KL divergence, A pathological relevance submatrix for a gene-fusion dimension, so that the decomposed matrix is ensured to keep pathological relevance; for rank constraint coefficient, avoiding matrix rank deviation after decomposition Metabolism matrix of the same theory Text matrix Performing dynamic rank decomposition; Finally, splicing the fused gene-metabolism vector and the phenotype vector to form a unified multi-modal characteristic representation: is an activation function; triggering an indication function for the gene; a multi-modal feature stitching formula: For the feature stitching operation, will \ Spliced with phenotype vector p, finally And synthesizing pathological information of coding genes, metabolism, phenotype and medical texts.
5. The method for risk assessment of neonatal genetic metabolic disease based on multimodal collaborative learning according to any one of claims 1 to 3, wherein the specific steps of constructing a metabolic pathway map and modeling the structural and functional relationships between metabolites using a graph rolling network in the third step are as follows: establishing a metabolic pathway map based on the KEGG database: Wherein, the Representing a set of metabolites, each node In response to the presence of a metabolite, Representing metabolites Can be converted into by enzyme action Edge weight Based on the normalization of the enzymatic reaction rate constant, the higher the rate, the closer to 1; Each node initialization is characterized by an input vector for the metabolite: Wherein, the Mapping a text vector t into a low dimension for a mapping matrix from a medical text to a metabolic feature, so as to realize the encoding from a clinical background to a node feature; Is a node Is used to determine the initial feature vector of (1).
6. The method for risk assessment of neonatal metabolic disease based on multimodal collaborative learning according to claim 5, wherein in step three, iterative cluster operation is adopted to reduce the scale of the graph, specifically, cluster merging means merging cluster feature differences when two cluster feature differences are small: Wherein the method comprises the steps of Merging into new clusters: Wherein the method comprises the steps of The cluster is pruned: For the variance of the cluster features, Is the i-th dimensional average of cluster C), Pruning cluster C; obtaining a graph convolution iteration formula: Wherein, the A neighbor cluster set for cluster C; Is a pruning cluster set; Weighting the neighbor clusters; Is a first layer parameter matrix; Is a disturbance term; is an activation function; and a graph feature aggregation formula: Wherein the method comprises the steps of The cluster size weight is given, L is the iteration layer number, and z comprehensively reflects the abnormal state of the metabolic network.
7. The method for risk assessment of neonatal genetic metabolic disease based on multi-modal collaborative learning according to any one of claims 1 to 3, wherein in the fourth step, the multi-modal features are aligned with the metabolic pathway map representation vectors, and the specific way of implementing efficient multi-modal fusion reasoning based on the confidence-guided incremental attention mechanism is as follows: First, individual vectors Representation vector of the graph Feature alignment is performed: Wherein, the In order for the parameters to be able to be learned, When the aligned graph represents vectors and the h dimension is matched, attention fusion compatibility is ensured through dimension expansion adaptation; Then, cross-modal interaction information is injected while original characteristics are reserved, and disease prediction is carried out: Finally, the features will be fused And (3) inputting a multi-task classifier, and synchronously outputting disease probability and diagnosis confidence: Wherein, the In order for the parameters to be able to be learned, Is a length of category number Each element Predicted as the first Probability of class.
8. The method for risk assessment of neonatal metabolic disease based on multi-modal collaborative learning as set forth in claim 7, wherein in the fourth step, while outputting the disease prediction result, a lightweight interpretation network is further designed for the original four-modal characteristics And predicting the result Performing modal-probability association modeling: Wherein the method comprises the steps of Respectively represent The characteristics of the four modes, 、 Corresponding modal parameters; then, the absolute contribution of each modality is obtained: normalization processing is carried out on the modal weights to obtain relative comparison among the modalities: And finally, combining the contribution degree with the clinical template, and outputting natural language explanation.

Description

Neonatal genetic metabolic disease risk assessment method based on multi-modal collaborative learning Technical Field The invention relates to the field of medical data processing, in particular to a neonatal inherited metabolic disease risk assessment method based on multi-mode collaborative learning. Background Genetic metabolic diseases are a major disease with a series of clinical symptoms caused by abnormal synthesis, metabolism, transportation, storage and other aspects of biochemical substances in the body due to protein function defects such as enzyme, receptor and carrier synthesized by the body caused by gene mutation. The inherited metabolic diseases are mostly autosomal recessive inherited, but the single incidence is low, but the overall incidence is high, thousands of inherited metabolic diseases are found at present, and the inherited metabolic diseases can be classified into protein, carbohydrate, lipid, nucleic acid, peroxisome, metal and other metabolic disorders according to the involved metabolic substances, so that the service life and the life quality of the children are seriously influenced. Tandem mass spectrometry detection is the main means for judging genetic defects of newborns at present, and genetic abnormalities are widely detected. In the prior art, partial disease prediction schemes focus on single metabonomics data modeling, gene and phenotype information are not integrated, the age specificity of enzyme activity in a neonatal metabolic pathway is not considered, common metabolic parameters are directly adopted to possibly cause neonatal sample prediction deviation, in addition, a multi-source data fusion scheme adopts a conventional fusion mode of 'directly inputting a model after characteristic splicing', pathological and causal relations among modes are not established, and phenotype coding adopts a common clinical text model, comprises a large number of non-neonatal characteristics, cannot adapt to the special clinical phenotype of the neonatal genetic metabolic disease, is difficult to meet the accurate requirement of neonatal disease prediction, and occupies more display memory in the data processing process. Disclosure of Invention The invention discloses a neonatal genetic metabolic disease risk assessment method based on multi-mode collaborative learning and with low video memory requirements based on a deep neural network, which aims to improve early screening and diagnosis accuracy of genetic metabolic diseases, reduce video memory occupation through full-flow optimization and adapt to clinical application scenes. In order to achieve the aim, the invention provides the technical scheme that the neonatal genetic metabolic disease risk assessment method based on multi-mode collaborative learning is characterized by comprising the following steps of: firstly, collecting and preprocessing gene, metabolism and phenotype data, and constructing normalized multi-mode input characteristics; Step two, constructing a cross-modal interaction network, and extracting fusion characteristics from the data preprocessed in the step one to enhance the expression capacity of pathological information; step three, constructing a metabolic pathway diagram and modeling the structure and function relationship among metabolites by using a diagram convolution network; and step four, disease risk level assessment based on multi-mode collaborative reasoning is carried out, and hierarchical intelligent auxiliary decision making is realized. The specific steps of collecting and preprocessing gene, metabolism and phenotype data in the first step and constructing normalized multi-modal input characteristics are as follows, firstly, collecting neonatal samples of 1-28 days of birth, layering the neonatal samples according to 1-7 days and 8-28 days, taking peripheral blood and urine as core detection materials, simultaneously obtaining medical texts of pregnancy and pregnancy detection reports, childbirth records and family genetic medical history, and finally constructing a cross-modal characteristic input set through time alignment and unique identification of the samples: Wherein, the The vector of the gene is represented by a vector,The metabolic vector is represented by a vector of the metabolism,The expression of the phenotype vector is performed,Representing the medical text feature vector. As a further improvement of the invention, the specific mode of preprocessing the gene, metabolism and phenotype data in the first step is as follows: for gene data, mapping a base variation mode into a dense vector representation by adopting an embedding function, and further capturing key genetic characteristics of point mutation and indels: First of all The original inputs of the individual gene loci are base variation information and mutation type,An embedding function optimized for the characteristics of the gene sequence; And (3) carrying out normalization treatment on metabolite concentration dat