CN-121996790-A - Structured data automatic classification and classification method, device, equipment, medium and product
Abstract
The invention discloses an automatic classification and classification method, device, equipment, medium and product for structured data, and relates to the technical field of data security and information processing. After meta information of structured data to be processed is input, the meta information is firstly analyzed and semanteme processed to generate a semantic information structure, the semantic information structure is transmitted into a core reasoning engine which is constructed based on a large language model and is integrated with a classification and grading strategy knowledge base, then the engine carries out one-time logical reasoning on the semantic information structure according to the knowledge base to directly generate a decision result containing the category and the security grade of each field, and finally the result is output, thereby the end-to-end automatic flow taking the large language model as an intelligent core is constructed, and the deep semantic understanding and reasoning capability is utilized to replace the traditional rule matching and simple machine learning model, so that the accuracy, efficiency and interpretability of data classification and grading are obviously improved, and the automatic processing close to the human expert level is realized.
Inventors
- HUANG BIFENG
- LI RUI
- CHEN ZHIJIE
Assignees
- 武汉壹品慧生活技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260115
Claims (10)
- 1. An automatic classification and classification method for structured data, which is characterized by comprising the following steps: inputting meta information of the structured data to be processed; Analyzing and semantically processing the meta information to generate a standardized semantic information structure; The semantic information structure is transmitted into a core reasoning engine which is constructed based on a large language model and is integrated with a classification and grading strategy knowledge base, and the core reasoning engine carries out one-time logical reasoning on the semantic information structure according to the classification and grading strategy knowledge base to directly generate a classification and grading decision result which comprises the category and the safety grade of each field data in the structured data to be processed; And outputting the classification and grading decision result.
- 2. The method of claim 1, wherein the meta information includes database structure information or meta data information.
- 3. The method of claim 1, wherein parsing and semantically processing the meta information to generate a standardized semantic information structure comprises: analyzing the meta information to obtain an analysis result containing the identified header and field meanings; and semantically converting the analysis result into an information structure which can be understood by a core reasoning engine constructed based on a large language model and integrated with a classification hierarchical strategy knowledge base, and taking the information structure as a standardized semantic information structure.
- 4. The automatic classification and classification method of structured data according to claim 1, wherein the large language model is replaced by a classification and classification dedicated model, wherein the classification and classification dedicated model is a dedicated model which is pre-trained and fine-tuned for the classification and classification task of structured data and has a smaller parameter amount than a general large language model.
- 5. The method for automatically classifying and grading structured data according to claim 1, the method is characterized by outputting the classification and grading decision result, and comprises the following steps: and outputting the classification hierarchical decision result in a structure list form and/or a visual chart form.
- 6. The structured data automatic classification and ranking method of claim 1, wherein after outputting the classification and ranking decision result, the method comprises: And extracting a corresponding security level from the classification and grading decision result aiming at each field data in the structured data to be processed, and triggering and executing a data protection strategy according to the security level to protect the corresponding data, wherein the data protection strategy comprises a data encryption mode and/or a data desensitization mode.
- 7. The automatic classification and grading device for the structured data is characterized by comprising an information input unit, an information analysis unit, a logic reasoning unit and a result output unit which are sequentially connected in a communication mode; The information input unit is used for inputting meta information of the structured data to be processed; the information analysis unit is used for analyzing and semantically processing the meta information to generate a standardized semantic information structure; The logic reasoning unit is used for transmitting the semantic information structure into a core reasoning engine which is constructed based on a large language model and is integrated with a classification and grading strategy knowledge base, and the core reasoning engine carries out one-time logic reasoning on the semantic information structure according to the classification and grading strategy knowledge base to directly generate a classification and grading decision result which comprises the category and the security level of each field data in the to-be-processed structured data; And the result output unit is used for outputting the classification and grading decision result.
- 8. A computer device, comprising a storage module, a processing module and a transceiver module, which are connected in turn in communication, wherein the storage module is used for storing a computer program, the transceiver module is used for receiving and transmitting a message, and the processing module is used for reading the computer program and executing the structured data automatic classification and classification method according to any one of claims 1-6.
- 9. A computer readable storage medium having instructions stored thereon which, when executed on a computer, perform the structured data automatic classification and classification method of any of claims 1-6.
- 10. A computer program product comprising a computer program or instructions which, when executed by a computer, implement the structured data automatic classification and classification method of any of claims 1 to 6.
Description
Structured data automatic classification and classification method, device, equipment, medium and product Technical Field The invention belongs to the technical field of data security and information processing, and particularly relates to an automatic classification and classification method, device, equipment, medium and product of structured data. Background The data classification and grading is a basic stone for data security management and compliance management, and aims to classify data according to the sensitivity, importance and service attribute of the data and determine the corresponding security protection grade. Accurate and efficient data classification and classification are the precondition of realizing data differentiation safety control and meeting the requirements of related laws and regulations. Currently, the technical means for implementing data classification and classification in the industry mainly depends on the following ways: (A) The method is simple to implement, but the method is essentially a stiff character string or pattern matching, the concrete business context and deep semantic meaning of the data cannot be understood, and for the scene with fuzzy field names, shorthand or complex meaning, misjudgment and missed judgment are very easy to generate, so that the precision and recall rate of the classification result are not ideal; (B) The method can reduce the workload of manually writing rules to a certain extent, but the effect of the method is seriously dependent on the quality of characteristic engineering and the scale and representativeness of training data, so that when the situation of rapid service change and various data modes is faced, the model generalization capability is limited, the method is difficult to adapt to new and unseen data modes, and the maintenance and update cost is higher; (C) The method can be combined with the field knowledge of human experts, but has extremely low efficiency, can not deal with the processing requirement of mass data in enterprises, has strong subjectivity in manual judgment, has difficulty in unifying standards among different experts, still needs to repeatedly discuss and confirm when encountering cases with fuzzy boundaries, has low automation degree and high labor cost, and is difficult to ensure the consistency of large-scale data management projects. In summary, the existing data classification and classification technology mainly has the following defects: (1) The recognition precision and recall rate are low, the semantics cannot be understood by a rule mode, the generalization capability of a machine learning mode is weak, high-precision recognition is difficult to realize when the rule mode and the machine learning mode face complex and changeable enterprise real data, and the phenomena of false mark and missing mark are serious; (2) The prior art is mostly a 'black box' or simple matching process, can not logically infer a classifying and grading decision process like human expert, can not provide clear and credible explanation for a judging result, and is a great short board in the field of data security for emphasizing compliance audit and responsibility tracing; (3) The automation and the intelligent degree are limited, the mode is maintained by a large amount of manual rules, or is trained by heavy characteristic engineering and models, or is completely completed by manpower, and the real end-to-end intelligent processing is not realized, so that the automation flow is fragile, and massive data and long tail cases are difficult to process. Therefore, a technical scheme for data classification and classification that can deeply understand data semantics, has logical reasoning capability and can realize high automation is needed in the art to overcome the inherent defects of the prior art in terms of accuracy, interpretability and efficiency. Disclosure of Invention The invention aims to provide an automatic classification and classification method, device, computer equipment, computer readable storage medium and computer program product for structured data, which are used for solving the problems of low recognition precision and recall rate, lack of reasoning and interpretation capability and/or limited degree of automation and intelligence in the existing data classification and classification technology. In order to achieve the above purpose, the present invention adopts the following technical scheme: In a first aspect, there is provided a method for automatically classifying and grading structured data, including: inputting meta information of the structured data to be processed; Analyzing and semantically processing the meta information to generate a standardized semantic information structure; The semantic information structure is transmitted into a core reasoning engine which is constructed based on a large language model and is integrated with a classification and grading strategy knowledge base, and the c