CN-122025142-A - Health risk assessment method and system based on big data analysis

CN122025142ACN 122025142 ACN122025142 ACN 122025142ACN-122025142-A

Abstract

The invention discloses a health risk assessment method and a health risk assessment system based on big data analysis, comprising the following steps of obtaining multi-source health data and completing data standardization to obtain a multi-mode time sequence sample; the method comprises the steps of generating feature vectors and quality vectors of all modes, mapping the feature vectors into basic trust distribution, dividing the basic trust distribution into a hard evidence layer and a soft evidence layer to obtain discount evidence of the hard evidence layer and discount evidence of the soft evidence layer, calculating conflict quality and routing of the discount evidence of the hard evidence layer to obtain conflict degree of the hard layer fusion evidence and the hard layer, executing conflict routing of the discount evidence of the soft evidence layer and adopting PCR6 rules for redistribution to obtain conflict degree of the soft layer fusion evidence and the soft layer, calculating conflict quality between layers and adopting PCR6 rules for redistribution to obtain conflict degree between layers, and outputting risk level and risk confidence. The invention fuses the PCR6 conflict redistribution mechanism, and improves the stability of the health risk assessment result.

Inventors

HU HONGXIANG
ZHOU JIXIANG
ZHAO RAN

Assignees

杭州象限数智科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (9)

1. The health risk assessment method based on big data analysis is characterized by comprising the following steps: acquiring multi-source health data of a target user, and completing data standardization to obtain a multi-mode time sequence sample of a sliding time window; Generating a characteristic vector and a quality vector of each mode for the multi-mode time sequence sample, calculating a reliability coefficient by the quality vector, and mapping the characteristic vector into basic trust distribution; dividing the basic trust allocation into a hard evidence layer and a soft evidence layer, and performing reliability discount processing to obtain hard evidence layer discount evidence and soft evidence layer discount evidence; calculating conflict quality and routing of the discount evidence of the hard evidence layer, transferring the conflict quality with the reliability coefficient lower than the gating threshold to an unknowing set, and inputting the rest conflict quality into a PCR6 rule for redistribution to obtain the conflict degree of the hard fusion evidence and the hard layer; Performing conflict routing on the soft evidence layer discount evidence and adopting PCR6 rule redistribution to obtain the conflict degree of the soft layer fusion evidence and the soft layer; Carrying out interlayer fusion on the hard layer fusion evidence and the soft layer fusion evidence, calculating the interlayer conflict quality, adopting a PCR6 rule for reassigning, and improving the hard layer assignment weight when the validity condition of the hard evidence is met, otherwise, transferring the interlayer conflict quality to an unknowing set and improving the uncertainty to obtain an interlayer fusion result and an interlayer conflict degree; And judging conflict trend criteria by using the inter-layer conflict degree, and outputting risk level and risk confidence degree by combining the inter-layer fusion result.
2. The health risk assessment method based on big data analysis according to claim 1, wherein the obtaining of the multi-modal time series sample specifically comprises: Accessing and converging multi-source health data of a target user, establishing a data source list and a data access rule, reading original records from each data source according to the data access rule, writing the original records into a unified original data area, and outputting an original data set; Performing standardization processing on the original data set, writing each standardization record and a corresponding standardization mark into a standardization data area together, and outputting the standardization data set; Constructing a sliding time window according to the standardized data set, determining the starting moment of an evaluation period, the width of the time window and the sliding step length, sequentially generating continuous time window intervals according to the sliding step length, dividing standardized records with recording time falling into each time window interval in the standardized data set into corresponding time windows, merging according to data sources for each time window to form a data subset set in a multi-mode window, and outputting multi-mode time sequence samples organized according to time window sequence numbers.
3. The health risk assessment method based on big data analysis according to claim 1, wherein the obtaining of the basic trust allocation and reliability coefficients specifically comprises: based on the multi-mode time sequence sample, extracting a record set in a window according to modes, completing time stamp analysis, record duplication removal and sequencing, counting the number of records in the window, and generating a mode record sequence and the corresponding number of records; Performing feature vector generation processing on each modal record sequence, aggregating records in a window according to a preset feature generation rule to obtain statistic features, trend features and fluctuation features, splicing the statistic features, the trend features and the fluctuation features into modal feature vectors according to a preset dimension sequence, writing a modal identifier and a time window identifier at the same time, and outputting a modal feature vector set; performing quality vector generation and evidence mapping processing on each modal recording sequence, generating a missing degree according to the expected record number and the actual record number, generating a consistency degree according to the window ending time and the latest record time, generating a reliability degree according to the abnormal record duty ratio, reading the data source credible configuration, assembling to form a modal quality vector, calculating a reliability coefficient, inputting the modal feature vector into a modal scoring function to obtain the support degree of each health risk level, scaling the support degree according to the reliability coefficient, and distributing the residual quality to an unknowing set to form basic trust distribution, wherein the basic trust distribution comprises trust quality distributed to each preset health risk level and trust quality distributed to the unknowing set.
4. The health risk assessment method based on big data analysis according to claim 1, wherein the obtaining of the hard evidence layer discount evidence and the soft evidence layer discount evidence specifically comprises: Reading and checking the reliability coefficient and the mode identifier and the time window identifier allocated by the basic trust, performing rejection processing on the record with the missing mode identifier or the time window identifier, merging the record passing the check according to the mode identifier, and outputting the reliability coefficient and the basic trust allocated corresponding to the mode; Reading a preset evidence layer division rule, generating an evidence layer identifier for each mode, classifying the basic trust allocation corresponding to each mode into a hard evidence layer basic trust allocation and a soft evidence layer basic trust allocation respectively according to the evidence layer identifier, carrying out integrity check on the classified result, and generating a hard evidence layer basic trust allocation list and a soft evidence layer basic trust allocation list; And respectively executing reliability discount processing on the basic trust allocation of the hard evidence layer and the soft evidence layer, scaling according to the trust quality of each health risk level of each mode, and carrying out scaling processing according to the reliability coefficient corresponding to the mode, merging the residual trust quality which is not allocated after scaling to the trust quality of the unknowable set of the current mode to form discount evidences, converging the discount evidences of each mode in the hard evidence layer and the soft evidence layer, and respectively obtaining the discount evidences of the hard evidence layer and the discount evidences of the soft evidence layer.
5. The health risk assessment method based on big data analysis according to claim 1, wherein the obtaining of the hard layer fusion evidence and hard layer conflict degree specifically comprises: matching buckling evidences and reliability coefficients one by one according to the mode identification and the time window identification, establishing a hard evidence layer processing list for successfully matched records, and outputting a hard evidence layer discount evidence set and a hard evidence layer reliability coefficient set; Performing two-time traversal on the discount evidence set of the hard evidence layer, respectively reading trust quality corresponding to basic trust allocation, enumerating health risk level combinations pointed by the two discount evidences item by item, when the health risk levels pointed by the two discount evidences are different, counting the trust quality product corresponding to the combination item into conflict quality of evidence pairs and completing accumulation to obtain a conflict quality set of the hard evidence layer, and summarizing the conflict quality set of the hard evidence layer to obtain the conflict degree of the hard evidence layer; Presetting a gating threshold value, generating gating marks for a hard evidence layer reliability coefficient set one by one, marking discount evidences with reliability coefficients lower than the gating threshold value as low-reliability discount evidences, dividing conflict quality of the low-reliability discount evidences which are formed in a participation mode and written into a hard evidence layer conflict quality set into a transfer conflict quality set, otherwise dividing the conflict quality set to be reassigned, summarizing the transfer conflict quality set according to evidence pair indexes, accumulating summarized results to trust quality corresponding to an unknown set, and forming an unknown set increment; Invoking PCR6 conflict reassignment processing to execute fusion operation on discount evidences corresponding to the conflict quality set to be reassigned, sequentially selecting two discount evidences to be fused according to a preset combination sequence, generating a temporary fusion result, enumerating and distributing the discount evidences to be fused item by item to the trust quality of each preset health risk level and forming a product item of the level to the level, taking the product item as a conflict item when the health risk levels pointed by the two discount evidences to be fused are different, splitting the conflict item into two parts according to the proportion of the trust quality of the two discount evidences to be fused in the conflict item, respectively returning to the trust quality of the corresponding health risk level, completing one-time PCR6 reassignment, splitting all the conflict items back to obtain a fusion result, continuing iterative fusion as input of the next discount, accumulating the unknown set to the trust quality corresponding to the unknown set in the iterative fusion result, ending the iterative fusion when the rest discount evidences in the hard evidence layer discount list are empty, and outputting hard evidence layer fusion evidence.
6. The health risk assessment method based on big data analysis according to claim 1, wherein the obtaining of the collision degree between the soft layer fusion evidence and the soft layer specifically comprises: Reading discount evidences and corresponding reliability coefficients of a soft evidence layer in a current sliding time window, matching the modal identification with the time window identification piece by piece, and eliminating inconsistent records to form a soft evidence layer processing list; performing two-by-two traversal on the soft evidence layer discount evidence set, reading the trust quality corresponding to the basic trust distribution of each evidence, enumerating the level combinations, accumulating the trust quality products of the different level combinations into conflict quality, writing the index into the soft evidence layer conflict quality set according to the evidence, and summarizing the soft evidence layer conflict quality set to obtain the soft evidence layer conflict degree; And reading a gating threshold value and generating a gating mark, classifying conflict quality containing low reliable discount evidence into transfer conflict quality and summarizing and accumulating the conflict quality to unknown set trust quality, inputting the rest conflict quality corresponding evidence into a PCR6 rule to execute conflict reassignment, proportionally splitting each conflict item back to the hierarchical trust quality causing the conflict and iterating until the evidence is used up, and outputting soft evidence layer fusion evidence.
7. The health risk assessment method based on big data analysis according to claim 1, wherein the obtaining of the inter-layer fusion result and the inter-layer conflict degree specifically includes: Checking the consistency of the time window identifiers of the hard evidence layer fusion evidence and the soft evidence layer fusion evidence and completing alignment, and establishing an interlayer fusion input evidence pair; Inputting evidence pairs into interlayer fusion to calculate interlayer conflict quality, when the health risk grades pointed by the two are different, calculating corresponding trust quality products into conflict items, accumulating to obtain interlayer conflict quality, and writing an interlayer conflict quality record; Performing hard evidence validity judgment on the hard evidence layer fusion evidence, generating a valid mark when the hard evidence validity condition is met, and generating an invalid mark when the hard evidence validity condition is not met; when the effective mark is judged, interlayer bias processing is executed, a preset interlayer bias strategy is read, hard evidence layer bias weight is generated, trust quality in the hard evidence layer fusion evidence is amplified according to the hard evidence layer bias weight, normalization processing is executed on the trust quality which corresponds to the unknowns after the amplification, and biased hard evidence layer fusion evidence is output; And executing interlayer PCR6 fusion processing, when the effective mark is judged, taking biased hard evidence layer fusion evidence and soft evidence layer fusion evidence as fusion input, splitting and adding conflict items of the two in a redistribution mode of the PCR6 to obtain an interlayer fusion result, outputting interlayer conflict quality as interlayer conflict degree, and when the ineffective mark is judged, transferring the interlayer conflict quality to trust quality corresponding to an unknowns set and improving the distribution proportion of the unknowns set to obtain the interlayer fusion result.
8. The health risk assessment method based on big data analysis according to claim 1, wherein the outputting of the risk level and the risk confidence level specifically comprises: writing the inter-layer conflict degree into a conflict degree sequence according to the time window mark, and calling the inter-layer conflict degree of the sliding time window to form a conflict degree historical sequence; reading a preset conflict degree threshold value, a preset conflict increment threshold value and a preset continuous window number threshold value, comparing the current inter-layer conflict degree with the preset conflict degree threshold value to generate a trend mark, calculating increment of adjacent inter-layer conflict degrees in a conflict degree history sequence, comparing the increment with the preset conflict increment threshold value to generate an increment mark, counting increment marks in the latest continuous window, and outputting a conflict trend criterion result; And reading the trust quality from the interlayer fusion result, selecting a preset health risk level with the maximum trust quality as a risk level output, converting the unknown set trust quality and the interlayer conflict degree into a confidence attenuation factor, and combining to obtain a risk confidence degree.
9. A health risk assessment system based on big data analysis, performing a health risk assessment method based on big data analysis according to any of claims 1 to 8, comprising: The data access standardization module is used for accessing multi-source health data of a target user, executing field, coding, unifying time reference and unit, and outputting standardized health data; The time window sample construction module is used for constructing a sliding time window based on standardized health data, dividing records according to the time window and merging the records according to data sources to form a multi-mode time sequence sample; The characteristic quality modeling module is used for generating a record sequence according to modes for the multi-mode time sequence sample, extracting statistics, trends and fluctuation characteristics to form a mode characteristic vector, and calculating a reliability coefficient; the basic trust distribution module is used for inputting the modal feature vectors into the scoring function to obtain the support degree of each health risk level, scaling the support degree by combining the reliability coefficient and distributing the residual quality to the unknowing set; the hierarchical discount gating module is used for dividing the basic trust allocation into a hard evidence layer and a soft evidence layer, executing reliability discount, calculating conflict quality and transferring low-reliability conflict to an unknowing set; And the PCR6 fusion and risk output module is used for performing PCR6 conflict reassignment and interlayer fusion on the hard layer and the soft layer and outputting the risk level and the risk confidence.

Description

Health risk assessment method and system based on big data analysis Technical Field The invention relates to the technical field of medical health big data, in particular to a health risk assessment method and system based on big data analysis. Background Along with the rapid development of wearable equipment, electronic medical record systems, health management platforms and medical detection technologies, the related data of personal health presents the characteristics of various sources, huge scale and frequent updating, the prior health risk assessment technology is usually based on physical examination indexes, historical medical records or single monitoring equipment data, and evaluates the health state of an individual through statistical analysis or a machine learning model, so that the method has certain application in the aspects of chronic disease management, risk early warning, auxiliary decision and the like, however, in practical application, the multi-source health data has obvious differences in the aspects of acquisition frequency, data quality, timeliness, reliability and the like, different data sources often give inconsistent or even contradictory judgment to the same health risk, and great challenges are brought to health risk evaluation. Aiming at the problem of multi-source health data fusion, in the prior art, unified modeling is generally carried out by adopting a mode of feature stitching, weighted average or traditional probability model, part of methods introduce evidence theory to fuse multi-source information so as to treat uncertainty, but the existing evidence fusion methods mostly adopt fixed fusion rules, the credibility of each data source is not sufficiently distinguished, information amplification or result distortion easily occurs in a high-conflict scene, in addition, the prior art usually regards all health data as equal evidence to participate in fusion, and a layering processing mechanism of key, high-reliability data and auxiliary and low-reliability data is lacked, so that an evaluation result is more sensitive to noise data and abnormal data and is insufficient in stability. Therefore, how to provide a health risk assessment method and system based on big data analysis is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The invention aims to provide a health risk assessment method and a health risk assessment system based on big data analysis, and the health risk assessment method and the health risk assessment system based on big data analysis are used for carrying out self-adaptive processing on conflict information in multi-source health data by introducing a hard evidence layer and soft evidence layer layered fusion mechanism based on reliability gating and combining a PCR6 conflict redistribution rule, so that the interference of the high conflict data on an assessment result is effectively reduced, and the stability, the reliability and the result robustness of health risk assessment under complex, multi-source and inconsistent data scenes are improved. According to the embodiment of the invention, the health risk assessment method based on big data analysis comprises the following steps: acquiring multi-source health data of a target user, and completing data standardization to obtain a multi-mode time sequence sample of a sliding time window; Generating a characteristic vector and a quality vector of each mode for the multi-mode time sequence sample, calculating a reliability coefficient by the quality vector, and mapping the characteristic vector into basic trust distribution; dividing the basic trust allocation into a hard evidence layer and a soft evidence layer, and performing reliability discount processing to obtain hard evidence layer discount evidence and soft evidence layer discount evidence; calculating conflict quality and routing of the discount evidence of the hard evidence layer, transferring the conflict quality with the reliability coefficient lower than the gating threshold to an unknowing set, and inputting the rest conflict quality into a PCR6 rule for redistribution to obtain the conflict degree of the hard fusion evidence and the hard layer; Performing conflict routing on the soft evidence layer discount evidence and adopting PCR6 rule redistribution to obtain the conflict degree of the soft layer fusion evidence and the soft layer; Carrying out interlayer fusion on the hard layer fusion evidence and the soft layer fusion evidence, calculating the interlayer conflict quality, adopting a PCR6 rule for reassigning, and improving the hard layer assignment weight when the validity condition of the hard evidence is met, otherwise, transferring the interlayer conflict quality to an unknowing set and improving the uncertainty to obtain an interlayer fusion result and an interlayer conflict degree; And judging conflict trend criteria by using the inter-layer conflict degree, and outputting risk level and