CN-121997361-A - Method, system, electronic device and storage medium for generating detection strategy of data

CN121997361ACN 121997361 ACN121997361 ACN 121997361ACN-121997361-A

Abstract

The application discloses a method, a system, electronic equipment and a storage medium for generating a data detection strategy, and relates to the field of data security. The method comprises the steps of obtaining a file to be processed, calling a data processing model, determining general categories, sensitivity indexes and keywords of the file to be processed, determining recommended levels of detection rules to be generated based on the general categories and the sensitivity indexes, generating rule names of the detection rules based on the keywords, generating detection logic information of the detection rules based on at least the general categories, the sensitivity indexes and the keywords, and generating detection strategies of clients by the recommended levels, the rule names and the detection logic information. The method and the device solve the technical problem of low generation efficiency of the data detection strategy.

Inventors

LIU YUXUAN
FANG TINGTING
MAO JIA

Assignees

阿里云计算有限公司

Dates

Publication Date: 20260508
Application Date: 20241107

Claims (19)

1. The method for generating the data detection strategy is characterized by comprising the following steps: Acquiring a file to be processed, wherein the file to be processed comprises log data to be detected generated by a client in the running process; Invoking a data processing model to determine general categories, sensitivity indexes and keywords of the files to be processed, wherein the data processing model is obtained at least based on big model training, the general categories are used for representing the categories with general meanings, the sensitivity indexes are used for representing the sensitivity degree of texts in the files to be processed, and the keywords are used for describing the text structures of the texts; Determining a recommendation level of a detection rule to be generated based on the general category and the sensitivity index, generating a rule name of the detection rule based on the keyword, and generating detection logic information of the detection rule based on at least the general category, the sensitivity index and the keyword, wherein the detection rule is used for representing a rule for performing security detection on the client, the recommendation level is used for representing a priority degree of recommending the detection rule, and the detection logic information is used for representing logic adopted in an execution process of the detection rule; And generating a detection strategy of the client by using the recommendation level, the rule name and the detection logic information, wherein the detection strategy is used for representing the generated detection rule.
2. The method of claim 1, wherein invoking a data processing model to determine the general category, sensitivity index, and keywords of the file to be processed comprises: Invoking a classification model in the data processing model to classify the files to be processed to obtain the general categories, wherein the classification model is obtained by training a corresponding deep learning model by using general category samples; invoking a sensitivity evaluation model in the data processing model, and performing sensitivity evaluation on the file to be processed to obtain the sensitivity index, wherein the sensitivity evaluation model is obtained by training a corresponding deep learning model by using a sensitivity index sample; And calling a data identification model in the data processing model, and identifying the keywords from the file to be processed, wherein the data identification model is obtained based on training of a large language model.
3. The method of claim 2, wherein invoking the data recognition model of the data processing model to recognize the keyword from the document to be processed comprises: determining prompt information corresponding to the file to be processed, wherein the prompt information is used for representing a plurality of steps for extracting the keywords from the file to be processed, and the steps have a logic relationship; and guiding the data identification model by using the prompt information, and identifying the keywords from the file to be processed.
4. The method according to claim 2, wherein invoking a classification model in the data processing model to classify the file to be processed to obtain the generic class comprises: and calling a multi-layer perceptron in the classification model to classify different texts in the file to be processed into different general categories.
5. The method according to claim 2, wherein invoking a sensitivity evaluation model in the data processing model, performing sensitivity evaluation on the file to be processed, and obtaining the sensitivity index comprises: And calling the sensitivity evaluation model, determining the text content of the text in the file to be processed, and performing sensitivity evaluation on the text content to obtain the sensitivity index, wherein the sensitivity index is positively correlated with the sensitivity degree of the text content.
6. The method of claim 2, wherein generating a rule name for the detection rule based on the keyword comprises: calling the data identification model, and analyzing the keywords and different texts in the file to be processed to obtain text content categories of the different texts; And clustering the text content categories of different texts to obtain the rule names.
7. The method of claim 6, wherein clustering text content categories of different texts to obtain the rule name comprises: Performing one-pass clustering processing on the text content types of different texts to obtain clustering results; And determining the text content category with the similarity higher than a similarity threshold value with the clustering result in the text content categories of different texts as the rule name.
8. The method of claim 7, wherein the method further comprises: Heuristic filtering is carried out on the text content category in an abnormal state from the text content categories of different texts; performing one-pass clustering processing on the text content categories of different texts to obtain clustering results, wherein the clustering processing comprises the following steps: and carrying out one-pass clustering treatment on the text content categories of the different filtered texts to obtain the clustering result.
9. The method of claim 1, wherein determining a recommendation level for a detection rule to be generated based on the generic class and the sensitivity index comprises: Determining a default sensitivity index corresponding to the general category, wherein the default sensitivity index is used for representing the default sensitivity degree of the text in the file to be processed of the general category; linearly combining the default sensitivity index and the sensitivity index to obtain a combined result; determining the recommended level matching the combined result.
10. The method of claim 1, wherein generating detection logic information for the detection rule based at least on the generic category, the sensitivity index, and the keyword comprises: converting at least the generic category, the sensitivity index, and the keyword into a feature vector; Inputting the feature vector as a positive sample into a decision tree structure model, and performing machine learning on the positive sample by utilizing the decision tree structure model to obtain a decision tree classifier; Acquiring identification logic information of the decision tree classifier, wherein the identification logic information is used for representing logic of the trained decision tree classifier for identifying input data; and determining the identification logic information as the detection logic information.
11. The method according to claim 10, wherein the method further comprises: Invoking a data recognition model to recognize the entity object of the text in the file to be processed, wherein the data recognition model is obtained based on training of a large language model; Converting at least the general category, the sensitivity index and the keyword into feature vectors, wherein the feature vectors are obtained by combining the number of entity objects, the recommendation level, the file identification of the file to be processed, the general category, the sensitivity index and the keyword.
12. The method according to any one of claims 1 to 11, further comprising: Selecting a target detection strategy from a plurality of the detection strategies in response to a selection operation; Determining the target detection strategy as a sub-category; And classifying the sub-category into a parent category to which the general category belongs.
13. The method according to any one of claims 1 to 11, further comprising: if the detection strategy is in an abnormal state, responding to the adjustment operation of the detection strategy, adjusting the detection strategy, and training the data processing model by utilizing the adjusted detection strategy; and acquiring a target file triggering the detection strategy after online, and training the data processing model by utilizing the target file.
14. The method for generating the data detection strategy is characterized by comprising the following steps: acquiring a file to be processed from a data asset of a client, wherein the category of the data asset corresponds to a data use scene of the client, and the file to be processed comprises log data to be detected generated in the running process of the client under the data use scene; Invoking a classification model to classify the files to be processed to obtain general categories, invoking a sensitivity evaluation model to evaluate the sensitivity of the files to be processed to obtain sensitivity indexes, invoking a data recognition model to extract keywords from the files to be processed, wherein the classification model is obtained by training a corresponding deep learning model by using general category samples, the general categories are used for representing the categories with general meanings, the sensitivity evaluation model is obtained by training the corresponding deep learning model by using sensitivity index samples, the sensitivity indexes are used for representing the sensitivity degree of texts in the files to be processed, the data recognition model is obtained by training based on a large language model, and the keywords are used for describing the text structures of the texts; Determining a recommendation level of a detection rule to be generated based on the general category and the sensitivity index, generating a rule name of the detection rule based on the keyword, and generating detection logic information of the detection rule based on at least the general category, the sensitivity index and the keyword, wherein the detection rule is used for representing a rule for performing security detection on the client, the recommendation level is used for representing a priority degree of recommending the detection rule, and the detection logic information is used for representing logic adopted in an execution process of the detection rule; And generating a detection strategy of the client by using the recommendation level, the rule name and the detection logic information, wherein the detection strategy is used for representing the generated detection rule.
15. The method for generating the data detection strategy is characterized by comprising the following steps: Acquiring a file to be processed by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the file to be processed, and the file to be processed comprises log data to be detected generated by a client in the running process; Invoking a data processing model to determine general categories, sensitivity indexes and keywords of the files to be processed, wherein the data processing model is obtained at least based on big model training, the general categories are used for representing the categories with general meanings, the sensitivity indexes are used for representing the sensitivity degree of texts in the files to be processed, and the keywords are used for describing the text structures of the texts; Determining a recommendation level of a detection rule to be generated based on the general category and the sensitivity index, generating a rule name of the detection rule based on the keyword, and generating detection logic information of the detection rule based on at least the general category, the sensitivity index and the keyword, wherein the detection rule is used for representing a rule for performing security detection on the client, the recommendation level is used for representing a priority degree of recommending the detection rule, and the detection logic information is used for representing logic adopted in an execution process of the detection rule; Generating a detection strategy of the client by using the recommendation level, the rule name and the detection logic information, wherein the detection strategy is used for representing the generated detection rule; And outputting the detection strategy by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the detection strategy.
16. A system for generating a detection policy for data, comprising: The client is used for uploading a file to be processed, wherein the file to be processed comprises log data to be detected generated in the running process of the client; The method comprises the steps of calling a data processing model, determining a general class, a sensitivity index and a keyword of a file to be processed, wherein the data processing model is obtained at least based on big model training, the general class is used for representing a class with general meaning, the sensitivity index is used for representing the sensitivity degree of a text in the file to be processed, the keyword is used for describing the text structure of the text, determining a recommendation level of a detection rule to be generated based on the general class and the sensitivity index, generating a rule name of the detection rule based on the keyword, and generating detection logic information of the detection rule based on at least the general class, the sensitivity index and the keyword, wherein the detection rule is used for representing a rule for carrying out safety detection on the client, the recommendation level is used for representing the priority degree of recommending the detection rule, and the detection logic information is used for representing logic adopted in the detection rule in the execution process, and generating the detection policy of the client.
17. An electronic device, comprising: A memory storing an executable program; A processor for executing the program, wherein the program when run performs the method of any one of claims 1 to 15.
18. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored executable program, wherein the executable program when run controls a device in which the storage medium is located to perform the method of any one of claims 1 to 15.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 15.

Description

Method, system, electronic device and storage medium for generating detection strategy of data Technical Field The present application relates to the field of data security, and in particular, to a method, a system, an electronic device, and a storage medium for generating a detection policy of data. Background Currently, aiming at the classification and classification problem in the field of data security, it is required to automatically produce a detection strategy with high accuracy so as to provide an automatic digital solution. In the related art, the detection strategy is compiled by means of collecting keywords of the files to be processed, defining regularities and the like, so that the time for finally generating the detection strategy is long, and in addition, a large amount of labor cost is required to be input to maintain the generated detection strategy along with the appearance of new types of the files to be processed, so that the cost is high, the response speed is low, and the problem of low generation efficiency of the data detection strategy exists. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a method, a system, electronic equipment and a storage medium for generating a data detection strategy, which are used for at least solving the technical problem of low generation efficiency of the data detection strategy. According to an aspect of an embodiment of the present application, a method for generating a detection policy for data is provided. The method comprises the steps of obtaining a file to be processed, wherein the file to be processed comprises log data to be detected generated by a client in an operation process, calling a data processing model, determining general categories, sensitivity indexes and keywords of the file to be processed, wherein the data processing model is obtained at least based on big model training, the general categories are used for representing the categories with general meanings, the sensitivity indexes are used for representing the sensitivity degree of texts in the file to be processed, the keywords are used for describing text structures of the texts, determining recommendation levels of detection rules to be generated based on the general categories and the sensitivity indexes, generating rule names of the detection rules based on the keywords, and generating detection logic information of the detection rules based on at least the general categories, the sensitivity indexes and the keywords, wherein the detection rules are used for representing rules for carrying out security detection on the client, the recommendation levels are used for representing the priority degree of the recommended detection rules, the detection logic information is used for representing logic adopted in the execution process, and the recommendation levels, the rule names and the detection logic information are used for representing the generated detection rules of the client. According to another aspect of the embodiment of the application, a method for generating a detection strategy of data is provided. The method comprises the steps of obtaining a file to be processed from a data asset of a client, wherein the type of the data asset corresponds to a data use scene of the client, the file to be processed comprises log data to be detected generated in the operation process of the client under the data use scene, calling a classification model, classifying the file to be processed to obtain a general type, calling a sensitivity evaluation model, carrying out sensitivity evaluation on the file to be processed to obtain a sensitivity index, calling a data recognition model, extracting keywords from the file to be processed, wherein the classification model is obtained by training a corresponding deep learning model by using a general type sample, the general type is used for representing the type with the general meaning, the sensitivity evaluation model is obtained by training the corresponding deep learning model by using a sensitivity index sample, the sensitivity index is used for representing the sensitivity degree of text in the file to be processed, the data recognition model is obtained by training based on a large language model, the keywords are used for describing the text structure of the text, determining the recommended level of a detection rule to be generated based on the general type and the sensitivity index, generating a rule name of the detection rule based on the keywords, and generating detection rule logic based on at least the general type, the sensitivity index and the keyword, the detection rule is used for generating the detection rule, and the detection rule is used for representing the detection rule, the recommended logic is used for the detection rule, and the detection rule is used for representing the detection rule is used for the detection