CN-121983323-A - Depression onset risk prediction method and system based on causal characterization learning
Abstract
The invention provides a depression onset risk prediction method and a system based on causal characterization learning, which relate to the technical field of intelligent medical treatment, and are characterized in that firstly, a multi-source implicit monitoring data set of target objects including autonomous neural function nonlinear micro-features, intelligent terminal touch behavior micro-features, environment space-time big data and non-depression related treatment track data in an electronic health file is obtained, then causal variable hierarchy marking processing is carried out on the data set, and then, a causal variation self-encoder network is called to extract a causal latent characterization vector set, a causal link diagram model is constructed based on the causal latent characterization vector set and causal variable level marks, and finally, causal weighted fusion processing is carried out on individual multi-source time sequence monitoring data acquired in real time according to the causal link diagram model, and a depression onset risk output result containing a prediction time window mark is generated by inputting a depression ultra-early onset risk assessment model. The invention realizes the ultra-early accurate prediction of the incidence risk of depression.
Inventors
- LIU XUEMEI
- LUO BIN
- YUE XIAOBO
Assignees
- 四川互慧软件有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260409
Claims (10)
- 1. A method of predicting risk of developing depression based on causal characterization learning, the method comprising: Acquiring a multi-source implicit monitoring data set of a target object, wherein the multi-source implicit monitoring data set comprises an autonomous neural function nonlinear micro-feature data unit with a time sequence identifier, an intelligent terminal touch behavior micro-feature data unit, an environment space-time big data unit and a non-depression related treatment track data unit in an electronic health record, which are continuously acquired; Carrying out causal variable hierarchy marking processing on the multi-source implicit monitoring data set according to a preset causal hierarchy dividing rule to obtain exogenous exposure factor hierarchy marks, internal phenotype abnormal intermediary hierarchy marks and depressive illness ending hierarchy marks corresponding to each data unit in the multi-source implicit monitoring data set; Invoking a pre-constructed causal variation self-encoder network to perform causal latent characterization extraction processing on a multi-source hidden monitoring data set carrying the causal variable level marks, and generating a causal latent characterization vector set which is causally related to the occurrence of depression, wherein each vector dimension in the causal latent characterization vector set corresponds to a causal feature unit after eliminating confounding variable bias; Performing directed acyclic graph learning processing with penalty terms based on the causal potential characterization vector set and the causal variable hierarchy markers, and constructing a causal link graph model comprising causal conduction paths among the exogenous exposure factor hierarchy, the endophenotype abnormal intermediaries hierarchy, and the depressive morbidity ending hierarchy; And carrying out causal weighted fusion processing on the individual multi-source time sequence monitoring data acquired in real time according to causal effect weights corresponding to each causal feature unit in the causal link map model to obtain an individual causal weighted fusion feature time sequence matrix, inputting the individual causal weighted fusion feature time sequence matrix into a depression super-early morbidity risk assessment model to carry out morbidity probability prediction processing, and generating a depression morbidity risk probability output result containing a prediction time window identifier.
- 2. The method for predicting risk of developing depression based on causal characterization learning according to claim 1, wherein the performing causal variable hierarchy marking processing on the multi-source implicit monitoring data set according to a preset causal hierarchy dividing rule to obtain an exogenous exposure hierarchy mark, an internal phenotype abnormal intermediary hierarchy mark and a depression outcome hierarchy mark corresponding to each data unit in the multi-source implicit monitoring data set comprises: Analyzing an original acquisition source identifier and a data structure type identifier of each data unit in the multi-source implicit monitoring data set, classifying the environmental space-time big data units and the non-depression related diagnosis trace data units in the electronic health record to an exogenous exposure factor level according to the original acquisition source identifier and the data structure type identifier, and attaching an exogenous exposure factor level mark to each data unit classified to the exogenous exposure factor level; Analyzing an original acquisition source identifier and a data structure type identifier of each data unit in the multisource implicit monitoring data set, classifying the autonomous neural function nonlinear micro-feature data units and the intelligent terminal touch behavior micro-feature data units to an internal phenotype abnormal intermediation level according to the original acquisition source identifier and the data structure type identifier, and attaching an internal phenotype abnormal intermediation level mark to each data unit classified to the internal phenotype abnormal intermediation level; Acquiring a depression clinical diagnosis result record of a target object corresponding to the multi-source implicit monitoring data set in a preset follow-up time window, generating a depression ending level mark of a binary state mark according to the depression clinical diagnosis result record, and carrying out time sequence association binding processing on the depression ending level mark and a data unit of a corresponding time point in the multi-source implicit monitoring data set; Carrying out data integrity scanning treatment on the multisource implicit monitoring data set after the completion of the hierarchical marking, identifying data units with missing hierarchical marking or conflicting hierarchical marking, and re-executing hierarchical classification judging operation according to the original acquisition source identification and the data structure type identification of the data units until all the data units carry unique and conflict-free hierarchical marking; Arranging and integrating the multisource implicit monitoring data sets carrying the complete hierarchy marks according to a time axis to generate a hierarchy structure time sequence data set containing an exogenous exposure factor hierarchy data subset, an internal phenotype abnormal intermediary hierarchy data subset and a depression incidence ending hierarchy mark corresponding to each time point; Extracting association relation characteristics among data units in each hierarchy in the hierarchy structured time sequence data set, constructing an initial causal path hypothesis set crossing the hierarchy according to the association relation characteristics, and taking the initial causal path hypothesis set as a priori constraint condition of subsequent causal potential characterization extraction processing.
- 3. The causal characterization learning-based depression episode risk prediction method according to claim 1, wherein the invoking the pre-constructed causal variation self-encoder network performs causal latent characterization extraction processing on a multi-source implicit monitoring data set carrying the causal variable hierarchy marker, to generate a causal latent characterization vector set causally associated with depression episodes, comprising: Inputting a multi-source implicit monitoring data set carrying the causal variable level mark into an encoder module of the causal variable self-encoder network, and performing dimension compression processing on input data through a multi-layer nonlinear mapping network of the encoder module to generate initial latent characterization vector distribution parameters, wherein the initial latent characterization vector distribution parameters comprise a mean vector and a logarithmic variance vector; Acquiring a preset confounding variable set, wherein the confounding variable set comprises an age identifier, a gender identifier, a basic disease identifier and a season identifier, performing single-hot encoding treatment on the confounding variable set to obtain a confounding variable encoding vector, and performing splicing treatment on the confounding variable encoding vector and the initial latent representation vector distribution parameter to obtain an extended latent representation distribution parameter carrying confounding information; invoking a distribution alignment module in the causal variation self-encoder network, and carrying out maximum mean difference distance calculation processing on the extended latent representation distribution parameters by taking the confounding variable encoding vector as a grouping basis to generate distribution alignment loss values representing the latent representation distribution differences under different confounding levels; Carrying out distribution alignment constraint optimization processing on the extended latent representation distribution parameters according to the distribution alignment loss values, and adjusting network weight parameters of the encoder module through a gradient back propagation algorithm to enable the latent representation distribution corresponding to different clutter level groups to gradually trend to be consistent, so as to obtain causal latent representation vector distribution parameters for eliminating clutter variable bias; performing random sampling operation from the causal potential representation vector distribution parameters to obtain an initial causal potential representation vector sample, inputting the initial causal potential representation vector sample into a decoder module of the causal variation self-encoder network to perform data reconstruction processing, and generating a reconstruction data set matched with the input data dimension; Calculating a reconstruction error loss value between the reconstruction data set and original input data, simultaneously calculating a KL divergence loss value between the causal potential representation vector distribution parameter and standard normal priori distribution, and carrying out weighted summation operation on the reconstruction error loss value, the KL divergence loss value and the distribution alignment loss value to obtain a total optimization loss function value of a causal variation self-encoder network; Performing joint iterative optimization training on an encoder module and a decoder module of the causal variation self-encoder network based on the total optimization loss function value until the total optimization loss function value is converged within a preset threshold range, and extracting causal potential characterization vectors corresponding to each input data unit from an output layer of the trained encoder module; and arranging and integrating all the extracted causal latent characterization vectors according to the time sequence and the hierarchy mark to generate a causal latent characterization vector set containing time sequence relation and hierarchy attribution information, wherein a dimension unit of each vector in the causal latent characterization vector set corresponds to a causal feature unit after clutter variable bias is eliminated.
- 4. A method of predicting risk of developing depression based on causal characterization learning according to claim 3, wherein the performing joint iterative optimization training on the encoder module and decoder module of the causal variation self-encoder network based on the total optimization loss function value until the total optimization loss function value converges to a preset threshold range, extracting a causal latent characterization vector corresponding to each input data unit from the trained encoder module output layer, comprises: Dividing the multisource implicit monitoring data set into a training data subset and a verification data subset; Initializing all trainable weight parameters of an encoder module and a decoder module in the causal variation self-encoder network to be random tiny values, and setting an optimizer type identifier, an initial learning rate parameter and a batch training sample number parameter; Randomly extracting a batch of multi-source hidden monitoring data units carrying level marks from the training data subset, inputting the extracted data units into an encoder module of the current iteration round for forward propagation calculation, and obtaining causal potential characterization vector distribution parameters of the current batch; Calculating the distribution alignment loss value, the reconstruction error loss value and the KL divergence loss value according to the causal potential characterization vector distribution parameters of the current batch, and calculating the total optimization loss function value of the current batch according to a preset weighting coefficient combination; calculating the gradient value of each trainable weight parameter in the encoder module and the decoder module through an automatic differentiation mechanism based on the total optimized loss function value of the current batch, and updating all trainable weight parameters according to the calculated gradient value by adopting a preset optimization algorithm; After finishing the updating of a batch parameter, inputting all data units in the verification data subset into an updated encoder module and decoder module for forward propagation calculation, calculating a total optimized loss function value corresponding to the verification data subset, and recording a verification loss value; Judging whether the verification loss values of a plurality of iteration rounds show a descending trend or whether the fluctuation amplitude exceeds a preset stability threshold, if the verification loss values continuously descend or the fluctuation amplitude exceeds the stability threshold, continuously executing batch extraction and forward propagation calculation operation to carry out next iteration training; If the verification loss value stops descending and the fluctuation amplitude is lower than a preset stability threshold value, judging that model training reaches a convergence state, stopping an iterative optimization process, and storing weight parameters of an encoder module and a decoder module corresponding to the current iteration round as model parameters after training is completed; Sequentially inputting all data units in the multisource implicit monitoring data set into a trained encoder module to perform forward propagation calculation, and extracting causal potential characterization vectors corresponding to each data unit from the output of the last hidden layer of the encoder module; And carrying out dimension standardization processing on the extracted causal latent characterization vector to generate a final causal latent characterization vector set with unified scale representation, wherein each vector dimension in the final causal latent characterization vector set corresponds to a causal feature unit after clutter variable bias is eliminated.
- 5. The causal characterization learning based depression onset risk prediction method according to claim 1, wherein the causal latent characterization vector set and the causal variable hierarchy marker based directed acyclic graph learning process with penalty term construct a causal link graph model comprising causal conduction paths between the exogenous exposure hierarchy, the endo-phenotypic abnormal intermediaries, and the depression onset outcome hierarchy, comprising: Splitting the causal potential representation vector set into an exogenous exposure factor potential representation matrix, an internal phenotype abnormal intermediary potential representation matrix and a depression onset potential representation vector according to the causal variable hierarchy mark, wherein the row of the exogenous exposure factor potential representation matrix corresponds to an exposure factor characteristic unit, and the row of the internal phenotype abnormal intermediary potential representation matrix corresponds to an internal phenotype abnormal characteristic unit; Performing time sequence hysteresis processing on the exogenous exposure factor latent characterization matrix and the internal phenotype abnormality intermediate latent characterization matrix to generate an extended feature matrix comprising a current time point and a plurality of hysteresis time points, wherein the extended feature matrix is used for capturing the delayed causal effect of exposure factors and internal phenotype abnormalities on the pathogenesis outcome; Initializing an adjacency matrix with a dimension equal to the total number of all causal feature units plus the total number of disease ending units, wherein each element in the adjacency matrix represents whether causal edges exist between two corresponding variables and the direction of the causal edges, and all diagonal elements are set to zero to ensure no self-loop; constructing a smooth linear structural equation model taking the adjacency matrix as a variable, wherein the smooth linear structural equation model represents the value of each variable as the linear combination of other variables plus a noise term, and the coefficient of the linear combination is determined by non-zero elements of the corresponding row of the adjacency matrix; Substituting the extended feature matrix and the depression ending potential characterization vector into the smooth linear structural equation model, calculating the residual square sum between each variable predicted value and the actual value through a least square method, and adding the residual square sums of all variables to obtain a model fitting loss function value; adding a sparsity penalty term and a directed acyclic constraint term based on the adjacency matrix on the basis of the model fitting loss function value, wherein the sparsity penalty term is used for controlling the number of causal edges to prevent overfitting, and the directed acyclic constraint term is used for ensuring that no directed circulation exists in the learned graph structure; carrying out iterative optimization solution on the joint objective function added with the penalty term and the constraint term by adopting an augmented Lagrangian method and a gradient descent algorithm, updating element values in the adjacent matrix each time, and updating Lagrangian multipliers and penalty coefficients to forcedly meet the directed acyclic constraint; In the iterative optimization process, hierarchy constraint is applied to the adjacency matrix according to the causal variable hierarchy mark, causal edges pointing to exogenous exposure factor hierarchies from inter-phenotype abnormal intermediaries are forbidden, causal edges pointing to any other hierarchy from depression onset ending hierarchies are forbidden, and the causal direction is ensured to accord with a preset hierarchy sequence; Stopping the optimization process when the iteration optimization reaches the maximum iteration times or the variation of the adjacent matrix is lower than a preset convergence threshold value, thresholding the finally obtained adjacent matrix, setting the adjacent matrix elements with absolute values lower than the preset significance threshold value as zero, and reserving the elements with absolute values higher than the threshold value as effective causal edges; And constructing a causal link graph model comprising variable nodes and directed edges according to the thresholded adjacency matrix, wherein each node in the causal link graph model corresponds to a causal feature unit or a pathogenesis ending unit, each directed edge points to a result node from a reason node, the weights of the edges correspond to element values in the adjacency matrix, and the causal link graph model completely presents causal conduction paths from an exogenous exposure factor level to a depression pathogenesis ending level through an inner phenotype abnormal intermediary level.
- 6. The method for predicting risk of developing depression based on causal characterization learning of claim 5, wherein after constructing a causal link map model comprising variable nodes and directed edges from thresholded adjacency matrices, the method further comprises: Carrying out causal effect value quantization processing on each directed edge in the causal link graph model, and extracting an adjacent matrix element value corresponding to each directed edge as a direct causal effect coefficient, wherein the direct causal effect coefficient represents the expected change quantity of a result variable when the cause variable changes by one unit; for multi-step causal paths from exogenous exposure factor level nodes to depression incidence conclusion level nodes through inner phenotype abnormal intermediary level nodes, calculating an indirect causal effect value of each multi-step path by adopting a path coefficient multiplication method, and multiplying all direct causal effect coefficients on the paths to obtain an indirect effect value of the path; Summing all indirect causal effect values passing through the same internal phenotype abnormal intermediaries level node to obtain the total intermediaries of the internal phenotype abnormal intermediaries between the exposure factors and the morbidity and the ending, wherein the total intermediaries quantify the intermediaries strength in the internal phenotype abnormal causal conduction process; Carrying out total causal effect calculation on each exogenous exposure factor node in the causal link graph model, and adding a direct causal effect value of the exogenous exposure factor node pointing to a depression incidence ending level node and an indirect causal effect value passing through all intermediary paths to obtain a total causal effect value of the exposure factor node on incidence ending; The method comprises the steps of carrying out importance ranking on nodes in a causal link map model according to the total causal effect value corresponding to each causal feature unit, and generating a core risk driving factor ranking list, wherein the core risk driving factor ranking list is used for identifying key feature units with the greatest contribution to depression onset; Performing stability evaluation processing on the causal link graph model, extracting a plurality of sample subsets from original data by adopting a self-help resampling method, and repeatedly executing directed acyclic graph learning operation with penalty items on each sample subset to obtain a plurality of adjacent matrix samples; Calculating the standard deviation of the occurrence frequency and the weight coefficient of each causal edge in a plurality of adjacent matrix samples, marking the causal edge with the occurrence frequency lower than a preset frequency threshold value or the weight coefficient standard deviation higher than a preset stability threshold value as an unstable edge, and removing the unstable edge from the causal link map model; Performing simplified optimization processing on the causal link graph model with the unstable edges removed, and reserving a plurality of core nodes with causal effect value accumulated contribution rates reaching a preset proportion and causal edges among the core nodes to generate a simplified core causal link graph model; And carrying out structural storage on causal feature unit identifications, node hierarchy attributions and causal effect coefficients among nodes of each node in the simplified core causal link map model, and constructing an interpretable depression morbidity causal knowledge base.
- 7. The method for predicting the risk of developing a depression disorder based on causal characterization learning according to claim 1, wherein the performing causal weighted fusion processing on the real-time collected individual multisource time sequence monitoring data according to causal effect weights corresponding to each causal feature unit in the causal link map model to obtain an individual causal weighted fusion feature time sequence matrix, inputting the individual causal weighted fusion feature time sequence matrix into a depression disorder super-early stage risk evaluation model to perform the prediction processing of the risk of developing a depression disorder, and generating a depression disorder risk output result including a prediction time window identifier comprises: Acquiring individual multi-source time sequence monitoring data of a target object, wherein the individual multi-source time sequence monitoring data comprise autonomous neural function nonlinear micro-feature real-time sequences, intelligent terminal touch behavior micro-feature real-time sequences, environment space-time big data real-time units and electronic health record non-depression related diagnosis track real-time records corresponding to nodes in a causal link map model; carrying out causal potential representation extraction processing identical to the model training stage on each characteristic real-time sequence in the individual multi-source time sequence monitoring data, and calling an encoder module of a trained causal variation self-encoder network to map each characteristic real-time sequence into an individual causal potential representation real-time vector so as to obtain an individual causal potential representation real-time matrix; Extracting a total causal effect value corresponding to each causal feature unit from the causal link map model, and normalizing the total causal effect value to obtain a causal fusion weight coefficient of each causal feature unit, wherein the size of the causal fusion weight coefficient and the total causal effect value are in positive correlation; performing element-by-element multiplication operation on each column vector in the individual causal potential representation real-time matrix and the causal fusion weight coefficient of the corresponding causal feature unit to obtain a weighted individual causal potential representation real-time matrix, performing intra-row splicing operation on the weighted matrix according to time points, and generating an individual causal weighted fusion feature vector corresponding to each time point; Stacking individual causal weighted fusion feature vectors of all time points according to time sequence, and constructing an individual causal weighted fusion feature time sequence matrix containing time dimension information and feature dimension information, wherein row indexes of the individual causal weighted fusion feature time sequence matrix correspond to continuous acquisition time points, and column indexes correspond to weighted fused causal feature dimensions; Inputting the individual causal weighted fusion characteristic time sequence matrix into a pre-trained depression super-early onset risk assessment model, wherein the depression super-early onset risk assessment model comprises a two-way long-short-term memory network layer and a Cox proportional risk output layer, carrying out forward and backward context information capturing processing on the time sequence matrix through the two-way long-short-term memory network layer, and extracting a dynamic evolution mode vector of the individual causal characteristics; Inputting the dynamic evolution mode vector into a Cox proportional risk output layer, and calculating cumulative morbidity risk function values of an individual in a plurality of preset future time windows by the Cox proportional risk output layer according to a preset reference risk function and the linear combination of the dynamic evolution mode vector; Carrying out probability transformation processing on the accumulated morbidity risk function values to generate individual depressive disorder probability values in a preset first time window in the future, depressive disorder probability values in a preset second time window in the future and depressive disorder probability values in a preset third time window in the future, so as to form a depressive disorder risk probability output result containing a time window identifier; Comparing the depression incidence risk probability output result with a preset risk probability dividing threshold value, and generating a corresponding risk probability interval identifier according to the comparison result, wherein the risk probability interval identifier comprises a first probability interval identifier, a second probability interval identifier, a third probability interval identifier and a fourth probability interval identifier.
- 8. The causal characterization learning based depression risk prediction method according to claim 7, wherein before inputting the individual causal weighted fusion feature timing matrix into a pre-trained depression ultra early onset risk assessment model, the method further comprises: An initial depression super-early onset risk assessment model is established, the initial depression super-early onset risk assessment model is formed by sequentially connecting an input layer, a two-way long-short-term memory network layer, a full-connection mapping layer and a Cox proportion risk output layer, the dimension of the input layer is matched with the dimension of an individual causal weighted fusion feature vector, and the two-way long-short-term memory network layer comprises a forward long-short term memory unit and a backward long-short term memory unit; Acquiring a sample queue training data set, wherein the sample queue training data set comprises a causal weighted fusion characteristic time sequence matrix of a plurality of time points of a plurality of confirmed depression patients before diagnosis, corresponding diagnosis time point identifiers, a causal weighted fusion characteristic time sequence matrix of a plurality of healthy control individuals in the same follow-up time length and undiagnosed identifiers; Taking a causal weighted fusion characteristic time sequence matrix of each individual in the sample queue training data set as input, taking a diagnosis time point mark or an undiagnosed mark of the corresponding individual as a supervision tag, and constructing a model training sample pair set; Sequentially inputting a time sequence matrix in the model training sample pair set into an initial depression ultra-early onset risk assessment model, transmitting the time sequence matrix to a two-way long-short-term memory network layer through an input layer, and generating a context sensing feature vector of each time step by the two-way long-short-term memory network layer by fusing a forward hidden state and a backward hidden state at each time step; Inputting the context-aware feature vector of the last time step into a full-connection mapping layer, and generating a compressed time sequence evolution mode feature vector through linear transformation and nonlinear activation functions of the full-connection mapping layer, wherein the dimension of the time sequence evolution mode feature vector is smaller than that of the original context-aware feature vector; Inputting the time sequence evolution mode feature vector into a Cox proportion risk output layer, calculating the morbidity risk function value of each individual at each time point in the observation period according to a preset reference risk function by the Cox proportion risk output layer, and integrating the morbidity risk function value on a time axis to obtain an accumulated risk function value; constructing a partial likelihood loss function according to the accumulated risk function value of each individual and the actual diagnosis time point or the undiagnosed identification, wherein the partial likelihood loss function is used for measuring the coincidence degree of the risk sequencing predicted by the model and the actual situation, and endowing positive contribution to individuals with correct risk function value sequencing and endowing negative contribution to individuals with wrong sequencing; Minimizing the partial likelihood loss function by adopting a gradient descent optimization algorithm, calculating gradients of all trainable parameters in a two-way long-short-term memory network layer, a full-connection mapping layer and a Cox proportion risk output layer by back propagation, and iteratively executing forward propagation and back propagation operations until the loss function converges according to gradient update parameter values; And introducing an early-stopping mechanism in the model training process, dividing the sample queue training data set into a training subset and a verification subset, calculating a partial likelihood loss value on the verification subset after each training round is finished, stopping training when the partial likelihood loss value on the verification subset is continuously reduced for a plurality of rounds, and storing model parameters of the current round as a pre-trained depression ultra-early onset risk assessment model.
- 9. The method for predicting risk of developing depression based on causal characterization learning of claim 1, wherein after generating the output of the probability of developing depression risk including the prediction time window identification, the method further comprises: acquiring an individual causal weighted fusion characteristic time sequence matrix of a target object in a past continuous preset historical time window range, and inputting the individual causal weighted fusion characteristic time sequence matrix in the past continuous preset historical time window range into a pre-constructed depression precursor dynamic abnormal track recognition module to perform abnormal track matching treatment; The depression precursor period dynamic abnormal track recognition module is internally pre-stored with a plurality of precursor period abnormal track sub-template libraries obtained through sample queue data clustering, the abnormal track sub-template libraries comprise a first precursor period abnormal track template, a second precursor period abnormal track template and a third precursor period abnormal track template, and each abnormal track template comprises a causal characteristic average value sequence and an abnormal threshold value sequence of each time point of the corresponding subtype before onset; Calculating a similarity distance measurement value between the individual causal weighted fusion characteristic time sequence matrix and each abnormal track template through a dynamic time warping algorithm, and simultaneously calculating a similarity distance measurement value between the individual causal weighted fusion characteristic time sequence matrix and a normal fluctuation baseline template of the healthy crowd; subtracting the similarity distance measurement value between the normal fluctuation baseline template of the healthy crowd from the minimum similarity distance measurement value between the individual causal weighted fusion characteristic time sequence matrix and the abnormal track template to obtain an individual abnormal track deviation degree score, wherein the individual abnormal track deviation degree score is used for quantifying the degree of deviation of the individual time sequence characteristic from a normal range; If the deviation degree score of the individual abnormal track exceeds a preset precursor abnormal judgment threshold, judging that the target object is in a depression precursor abnormal track state currently, and identifying a subtype identification corresponding to an abnormal track template with the minimum similarity distance of the individual causal weighted fusion characteristic time sequence matrix as a matched abnormal track subtype; Predicting an estimated time window for the target object to progress from the current state to clinically definite depression in a specific future time period according to the historical time window distribution data corresponding to the matched abnormal track subtype, and generating precursor progress early warning information comprising an estimated starting time point and an estimated ending time point; and carrying out association and integration processing on the depression onset risk probability output result and the precursor period progress early warning information to generate a comprehensive risk assessment report comprising a risk probability value, a probability interval identifier, a precursor period track subtype identifier and an expected onset time window.
- 10. A causal characterization learning-based depression onset risk prediction system, comprising: A processor; a machine-readable storage medium storing machine-executable instructions for the processor; Wherein the processor is configured to perform the causal characterization learning based depression onset risk prediction method of any one of claims 1 to 9 via execution of the machine executable instructions.
Description
Depression onset risk prediction method and system based on causal characterization learning Technical Field The invention relates to the technical field of intelligent medical treatment, in particular to a depression onset risk prediction method and system based on causal characterization learning. Background Depression is a common mental disorder disease that severely affects the physical and mental health and social functions of humans, and its pathogenesis is complex and not yet fully defined. Traditional depression onset risk prediction methods mainly depend on symptom information actively reported by patients, clinical inquiry, simple psychological assessment and the like. However, these methods have a number of limitations. On the one hand, the patient may have inaccurate or incomplete information provided due to insufficient cognition, subjective concealment, limited expressive power, etc. of the patient's own symptoms, thereby affecting the reliability of the prediction result. On the other hand, traditional methods tend to focus on the current dominant symptomatic manifestations, and it is difficult to capture the potential early signs of depression prior to onset, especially those hidden in individual's daily behaviors, physiological states and environmental factors. With the rapid development of information technology and sensor technology, multi-source data acquisition becomes possible. Multisource implicit monitoring data such as autonomic nerve functions and touch behaviors of an individual can be continuously acquired through wearable equipment, intelligent terminals and the like, and meanwhile, environmental space-time big data and non-depression related diagnosis track data in electronic health files are combined, and the data contains rich information related to depression morbidity. However, most of the existing data utilization modes simply carry out statistical analysis or machine learning modeling on the data, causal relation among the data is not fully considered, bias of confounding variables cannot be effectively eliminated, an inherent causal mechanism of depression onset is difficult to accurately reveal, and therefore accuracy and reliability of onset risk prediction are low, and actual requirements of depression ultra-early precise prevention and intervention cannot be met. Disclosure of Invention In view of the above-mentioned problems, with reference to the first aspect of the present invention, an embodiment of the present invention provides a method for predicting risk of developing depression based on causal characterization learning, the method comprising: Acquiring a multi-source implicit monitoring data set of a target object, wherein the multi-source implicit monitoring data set comprises an autonomous neural function nonlinear micro-feature data unit with a time sequence identifier, an intelligent terminal touch behavior micro-feature data unit, an environment space-time big data unit and a non-depression related treatment track data unit in an electronic health record, which are continuously acquired; Carrying out causal variable hierarchy marking processing on the multi-source implicit monitoring data set according to a preset causal hierarchy dividing rule to obtain exogenous exposure factor hierarchy marks, internal phenotype abnormal intermediary hierarchy marks and depressive illness ending hierarchy marks corresponding to each data unit in the multi-source implicit monitoring data set; Invoking a pre-constructed causal variation self-encoder network to perform causal latent characterization extraction processing on a multi-source hidden monitoring data set carrying the causal variable level marks, and generating a causal latent characterization vector set which is causally related to the occurrence of depression, wherein each vector dimension in the causal latent characterization vector set corresponds to a causal feature unit after eliminating confounding variable bias; Performing directed acyclic graph learning processing with penalty terms based on the causal potential characterization vector set and the causal variable hierarchy markers, and constructing a causal link graph model comprising causal conduction paths among the exogenous exposure factor hierarchy, the endophenotype abnormal intermediaries hierarchy, and the depressive morbidity ending hierarchy; And carrying out causal weighted fusion processing on the individual multi-source time sequence monitoring data acquired in real time according to causal effect weights corresponding to each causal feature unit in the causal link map model to obtain an individual causal weighted fusion feature time sequence matrix, inputting the individual causal weighted fusion feature time sequence matrix into a depression super-early morbidity risk assessment model to carry out morbidity probability prediction processing, and generating a depression morbidity risk probability output result containing a prediction tim