CN-121980607-A - Data security protection method and data security protection system

CN121980607ACN 121980607 ACN121980607 ACN 121980607ACN-121980607-A

Abstract

The application relates to a data security protection method and a data security protection system, and relates to the technical field of data processing. The data security protection method is applied to a medical information system and comprises the steps of obtaining heterogeneous data in the medical information system, enabling the heterogeneous data to comprise text data, image data and physiological signal data, inputting the heterogeneous data into a multi-mode fusion model, outputting threat representation vectors in a preset format, enabling the threat representation vectors to represent abnormal degrees of the heterogeneous data, conducting semantic analysis on an operation instruction by using a large language model, calculating risk scores in a combined mode according to the threat representation vectors, enabling the risk scores to represent risk degrees of the operation instruction for initiating access to the data in the medical information system, and upgrading encryption grades of the accessed data and triggering a strengthening authentication flow under the condition that the risk scores exceed a preset first threshold. The application can improve the data safety protection performance of the medical information system.

Inventors

PENG HUI
WANG HUANHUAN
HUANG YAO
CHENG DEBIN

Assignees

中国电子产品可靠性与环境试验研究所（（工业和信息化部电子第五研究所）（中国赛宝实验室））

Dates

Publication Date: 20260505
Application Date: 20251231

Claims (10)

1. A data security method for use in a medical information system, the method comprising: the heterogeneous data in the medical information system is acquired, wherein the heterogeneous data comprises text data, image data and physiological signal data; Inputting the heterogeneous data into a multi-mode fusion model, and outputting a threat representation vector in a preset format, wherein the threat representation vector is used for representing the degree of abnormality of the heterogeneous data; Carrying out semantic analysis on an operation instruction by using a large language model, and calculating a risk score in combination with the threat characterization vector, wherein the risk score is used for representing the risk degree of the operation instruction for initiating access to data in the medical information system; and under the condition that the risk score exceeds a preset first threshold value, upgrading the encryption grade of the accessed data and triggering the enhanced authentication flow.
2. The method of claim 1, wherein the multimodal fusion model includes a natural language processing network, a convolutional neural network, a time series neural network, a cross-modal attention mechanism, and an interactive feature fusion algorithm, wherein inputting the heterogeneous data into the multimodal fusion model outputs a threat characterization vector in a preset format, comprising: inputting the text data into the natural language processing network for feature extraction to generate text features; Inputting the image data into the convolutional neural network for feature extraction to generate image features; inputting the physiological signal data into the time sequence neural network for feature extraction to generate physiological features; Performing space-time alignment on the text feature, the image feature and the physiological feature by using the cross-modal attention mechanism to generate a fusion feature; And mining causal anomalies among different data sources in the fusion characteristics by using the interactive characteristic fusion algorithm, and generating threat representation vectors in a preset format.
3. The method of claim 1, wherein said semantically parsing the operational instructions using a large language model and calculating risk scores in conjunction with characterizing vectors from said threats comprises: acquiring the operation instruction and operation related data corresponding to the operation instruction; Inputting the operation instruction and the operation related data into the large language model for semantic analysis to generate an operation intention vector; and carrying out joint calculation according to the operation intention vector and the threat representation vector to obtain the risk score.
4. The method according to claim 1, wherein the method further comprises: inputting the operation instruction and the threat characterization vector into the large language model under the condition that the risk score exceeds a preset first threshold value, and outputting a causal relationship audit report; and updating the multi-mode fusion network according to the auditing result corresponding to the causal relationship auditing report.
5. The method according to any one of claims 1-4, further comprising: acquiring threat characteristic information generated locally by a medical institution based on the threat characterization vector; performing privacy protection processing on the threat characteristic information to generate privacy protection data; aggregating the privacy protection data sent by each medical institution through a cloud federal learning architecture, and training to generate a global threat detection model; and carrying out light weight treatment on the global threat detection model to obtain a light weight model, and deploying the light weight model to a local system and edge equipment of each medical institution.
6. The method of claim 5, wherein the threat signature information comprises a threat signature vector summary and/or a parameter update gradient, wherein, The step of performing privacy protection processing on the threat characteristic information to generate privacy protection data includes: And carrying out noise adding processing on the threat feature vector abstract and/or the parameter updating gradient locally in the medical institution by adopting a differential privacy technology to generate the privacy protection data.
7. The method according to any one of claims 1-4, further comprising: respectively carrying out anomaly detection on the image data and the text data in the heterogeneous data to generate corresponding image anomaly indexes and text anomaly indexes; Determining whether a judging result against attack exists according to the combined judging result of the image abnormal index and the text abnormal index; if the judgment result is yes, triggering a safety isolation mechanism and performing non-tamperable evidence storage on attack evidence; And sending the characteristic information of the attack resistance to a cloud federal learning architecture to update a global threat detection model.
8. A data security system for use in a medical information system, comprising: the multi-mode sensing module is used for acquiring heterogeneous data in the medical information system, wherein the heterogeneous data comprises text data, image data and physiological signal data; The threat representation learning module is used for inputting the heterogeneous data into a multi-mode fusion model and outputting a threat representation vector in a preset format, wherein the threat representation vector is used for representing the degree of abnormality of the heterogeneous data; the dynamic protection module is used for carrying out semantic analysis on the operation instruction by utilizing the large language model and calculating a risk score in combination with the threat characterization vector, wherein the risk score is used for representing the risk degree of the operation instruction for initiating access to the data in the medical information system, and the encryption grade of the accessed data is upgraded and the enhanced authentication flow is triggered under the condition that the risk score exceeds a preset first threshold.
9. The system of claim 7, wherein the system further comprises: the federal learning module is used for aggregating privacy protection data sent by each medical institution, training and generating a global threat detection model, carrying out light weight processing on the global threat detection model to obtain a light weight model, and deploying the light weight model to a local system and edge equipment of each medical institution, wherein the privacy protection data is generated by the medical institution locally based on threat characteristic information generated by the threat characterization vector, and carrying out privacy protection processing on the threat characteristic information.
10. The system of claim 7, wherein the system further comprises: the anti-attack defense module is used for respectively carrying out anomaly detection on the image data and the text data in the heterogeneous data to generate corresponding image anomaly indexes and text anomaly indexes, determining whether an anti-attack judgment result exists according to a combination judgment result of the image anomaly indexes and the text anomaly indexes, triggering a safety isolation mechanism and carrying out non-falsification evidence storage on attack evidence if the anti-attack judgment result is yes, and sending characteristic information of the anti-attack to a cloud federation learning architecture to update a global threat detection model.

Description

Data security protection method and data security protection system Technical Field The application relates to the technical field of data processing, in particular to a data security protection method and a data security protection system. Background Along with the rapid development of intelligent medical systems, medical data presents an exponentially growing situation, and particularly covers multi-source heterogeneous forms such as electronic medical records, medical images, wearable equipment physiological signal data and the like. These data play a key role in disease diagnosis, telemonitoring, and AI-assisted therapy, but their high value properties also make them a major target for network attacks. However, the prior art has significant limitations in the field of medical data security protection, and still has extremely high leakage and attack risks. Disclosure of Invention Based on the above, it is necessary to provide a data security protection method and a data security protection system, which can improve the data security protection performance of the medical information system. The application provides a data security protection method, which is applied to a medical information system, and comprises the following steps: the heterogeneous data in the medical information system is acquired, wherein the heterogeneous data comprises text data, image data and physiological signal data; Inputting the heterogeneous data into a multi-mode fusion model, and outputting a threat representation vector in a preset format, wherein the threat representation vector is used for representing the degree of abnormality of the heterogeneous data; Carrying out semantic analysis on an operation instruction by using a large language model, and calculating a risk score in combination with the threat characterization vector, wherein the risk score is used for representing the risk degree of the operation instruction for initiating access to data in the medical information system; and under the condition that the risk score exceeds a preset first threshold value, upgrading the encryption grade of the accessed data and triggering the enhanced authentication flow. In one embodiment, the multimodal fusion model includes a natural language processing network, a convolutional neural network, a time sequence neural network, a cross-modal attention mechanism and an interactive feature fusion algorithm, and the inputting the heterogeneous data into the multimodal fusion model outputs a threat characterization vector in a preset format, including: inputting the text data into the natural language processing network for feature extraction to generate text features; Inputting the image data into the convolutional neural network for feature extraction to generate image features; inputting the physiological signal data into the time sequence neural network for feature extraction to generate physiological features; Performing space-time alignment on the text feature, the image feature and the physiological feature by using the cross-modal attention mechanism to generate a fusion feature; And mining causal anomalies among different data sources in the fusion characteristics by using the interactive characteristic fusion algorithm, and generating threat representation vectors in a preset format. In one embodiment, the semantic parsing of the operation instructions using the large language model and calculating the risk score in combination with the threat characterization vector includes: acquiring the operation instruction and operation related data corresponding to the operation instruction; Inputting the operation instruction and the operation related data into the large language model for semantic analysis to generate an operation intention vector; and carrying out joint calculation according to the operation intention vector and the threat representation vector to obtain the risk score. In one embodiment, the method further comprises: inputting the operation instruction and the threat characterization vector into the large language model under the condition that the risk score exceeds a preset first threshold value, and outputting a causal relationship audit report; and updating the multi-mode fusion network according to the auditing result corresponding to the causal relationship auditing report. In one embodiment, the method further comprises: acquiring threat characteristic information generated locally by a medical institution based on the threat characterization vector; performing privacy protection processing on the threat characteristic information to generate privacy protection data; aggregating the privacy protection data sent by each medical institution through a cloud federal learning architecture, and training to generate a global threat detection model; and carrying out light weight treatment on the global threat detection model to obtain a light weight model, and deploying the light weight model to a local system an