CN-121980610-A - Intelligent data safety detection system based on semantic recognition

CN121980610ACN 121980610 ACN121980610 ACN 121980610ACN-121980610-A

Abstract

The invention relates to the technical field of data security, in particular to an intelligent data security detection system based on semantic recognition, which is configured to execute the steps of acquiring data to be detected from a detected system and recording and collecting metadata, applying a rule engine based on a mode, a regular expression and a black/white list to the original data for rapid prefiltering and preliminary labeling, calling a semantic recognition model and the rule engine for parallel detection on the preprocessed data, and comprehensively evaluating the confidence coefficient of the model, the rule matching result and the statistical anomaly degree based on a configurable fusion strategy. According to the method, the rule pre-filtering and the semantic recognition model work cooperatively, the detection instantaneity is ensured, the accurate recognition of the latent sensitive semantics in the data is realized, the rule matching result, the semantic confidence and the abnormal characteristics are fused for comprehensive evaluation, and the accuracy and the reliability of risk judgment are improved.

Inventors

HUANG GANG
ZHOU JINGJIE
XIANG QIULING
ZHOU QIAO
TANG JIAN
LIU XIN

Assignees

湖南省信网安科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260113

Claims (8)

1. An intelligent data security detection system based on semantic recognition, characterized in that the system is configured to perform the steps of an intelligent data security detection method based on semantic recognition: acquiring data to be detected from a tested system and recording acquisition metadata; Applying a rule engine based on a mode, a regular expression and a black/white list to the original data to perform rapid prefiltering and preliminary labeling; invoking a semantic recognition model and a rule engine to perform parallel detection on the preprocessed data, and comprehensively evaluating the model confidence level, the rule matching result and the statistical anomaly degree based on a configurable fusion strategy to generate candidate sensitive items and risk scores; Performing irreversible desensitization, reversible desensitization, or context placeholder replacement on the validated sensitive information according to policies and risk levels, and enforcing data flow blocking or access restriction if necessary; Grading the events according to the comprehensive evaluation result, and sending an alarm or starting emergency treatment according to configuration for the events reaching the alarm or fusing condition; recording the whole process logs of acquisition, detection, desensitization and treatment, extracting abnormal patterns based on audit result clusters and using manual/automatic labeling as feedback for incremental updating or retraining of models or rules.
2. The intelligent data security detection system based on semantic recognition according to claim 1, wherein the data to be detected is accessed from the system to be detected through an acquisition component, the acquisition component supports multiple data access modes, and source information and time information are recorded for the acquired data.
3. The intelligent data security detection system based on semantic recognition according to claim 1, wherein the rule engine pre-filters the original data based on preset pattern rules, regular expression rules and blacklist or whitelist rules, and performs preliminary risk labeling on hit rule data.
4. The intelligent data security detection system based on semantic recognition of claim 1, wherein the semantic recognition model is used for performing semantic level analysis on data, recognizing sensitive content by combining context information, and fusing semantic analysis results with rule matching results to form risk assessment results.
5. The intelligent data security detection system based on semantic recognition according to claim 1, wherein the fusion evaluation result is used for generating candidate sensitive items and corresponding risk levels, and the risk levels are comprehensively determined by semantic analysis results, rule matching results and abnormal features.
6. The intelligent data security detection system based on semantic recognition as recited in claim 1, wherein the desensitization process selectively performs one or more of irreversible desensitization, reversible desensitization, or context placeholder substitution according to a risk level, and limits or blocks related data flows when a preset condition is met.
7. The intelligent data security inspection system of claim 1 wherein said system logs inspection, desensitization and disposal processes and updates or optimizes rules or semantic recognition models based on said log results.
8. An intelligent data security detection system based on semantic recognition according to any one of claims 1-7, comprising a data acquisition module, a rule pre-filtering module, a semantic analysis module, a desensitization processing module, a risk assessment and notification module and an audit feedback module; The data acquisition module is used for acquiring data to be detected from the tested system and recording acquisition metadata; The rule pre-filtering module is used for carrying out quick screening and preliminary labeling on the data to be detected based on a preset rule; The semantic analysis module is used for carrying out semantic level sensitive information identification on the preprocessed data, and comprehensively evaluating the detection result by combining the rule matching result and the abnormal characteristics to generate a risk evaluation result; The desensitization processing module is used for performing desensitization processing on the information confirmed to be sensitive or taking access limiting measures according to the risk assessment result; The risk assessment and notification module is used for grading the detected event and triggering an alarm or emergency treatment when the event reaches a preset condition; The audit feedback module is used for recording the information of the whole process of data acquisition, detection, desensitization and treatment, and taking the rechecking result as feedback to optimize a detection rule or a semantic analysis model; the modules cooperate with each other to realize automatic detection and closed-loop processing of data security risks.

Description

Intelligent data safety detection system based on semantic recognition Technical Field The invention relates to an intelligent data security detection system based on semantic recognition, and belongs to the technical field of data security. Background With the continuous improvement of the digitization and networking degree of the information system, the service system can generate and circulate a large amount of multi-type data containing user information, service data and system configuration parameters in the operation process. In order to prevent the risk of data leakage, in the prior art, sensitive fields in data are usually detected and protected in a mode of rule matching, keyword recognition or regular expression and the like; The existing data security detection scheme is mostly dependent on static rules or manual configuration strategies, has a certain detection capability on structured data, but is difficult to accurately identify sensitive information of an underlying semantic layer when facing natural language text, semi-structured data or data related across contexts. Meanwhile, the traditional scheme often adopts a single detection or desensitization mechanism, lacks differentiated treatment capability for different risk levels, and is difficult to form a continuously optimized safety closed loop due to mutual fracture among detection, treatment and audit processes. The existing data security detection technology is difficult to realize accurate identification of semantic level sensitive information in multi-source data on the premise of ensuring real-time performance, and lacks technical means for unified closed-loop management of detection, risk assessment, dynamic desensitization and audit feedback, so that the detection accuracy, treatment flexibility and sustainable optimization capability of a system are insufficient, and therefore, improvement of an intelligent data security detection system based on semantic identification is needed to solve the problems. Disclosure of Invention The invention aims to provide an intelligent data security detection system based on semantic recognition, which solves the problems that the existing data security detection technology is difficult to realize accurate recognition of semantic-level sensitive information in multi-source data on the premise of ensuring real-time performance, and lacks a technical means for carrying out unified closed-loop management on detection, risk assessment, dynamic desensitization and audit feedback, so that the detection accuracy, the disposal flexibility and the sustainable optimization capability of the system are insufficient. In order to achieve the above purpose, the present invention provides the following technical solutions: An intelligent data security detection system based on semantic recognition, the system configured to perform the steps of an intelligent data security detection method based on semantic recognition: acquiring data to be detected from a tested system and recording acquisition metadata; Applying a rule engine based on a mode, a regular expression and a black/white list to the original data to perform rapid prefiltering and preliminary labeling; invoking a semantic recognition model and a rule engine to perform parallel detection on the preprocessed data, and comprehensively evaluating the model confidence level, the rule matching result and the statistical anomaly degree based on a configurable fusion strategy to generate candidate sensitive items and risk scores; Performing irreversible desensitization, reversible desensitization, or context placeholder replacement on the validated sensitive information according to policies and risk levels, and enforcing data flow blocking or access restriction if necessary; Grading the events according to the comprehensive evaluation result, and sending an alarm or starting emergency treatment according to configuration for the events reaching the alarm or fusing condition; recording the whole process logs of acquisition, detection, desensitization and treatment, extracting abnormal patterns based on audit result clusters and using manual/automatic labeling as feedback for incremental updating or retraining of models or rules. Preferably, the data to be detected is accessed from the tested system through an acquisition component, and the acquisition component supports various data access modes and records source information and time information for the acquired data. Preferably, the rule engine pre-filters the original data based on a preset mode rule, a regular expression rule and a blacklist or whitelist rule, and performs preliminary risk labeling on the hit rule data. Preferably, the semantic recognition model is used for performing semantic level analysis on the data, recognizing sensitive content by combining the context information, and fusing a semantic analysis result and a rule matching result to form a risk assessment result. Preferably, the fusi